What are some scenario-based SQL queries

Cutting Edge: Querying JSON Data in SQL Server 2016

  • 10 minutes to read

April 2017

Volume 32, number 4

By Dino Esposito | April 2017


Transferring data between independent and autonomous systems is currently the main task of most software, and JSON is the ubiquitous language behind this data transfer. The acronym "JSON" stands for "JavaScript Object Notation". JSON provides a text-based way to provide the state of an object so that it can be easily serialized and transferred from one system to the next, especially in heterogeneous systems.

JSON has become something that XML ultimately failed: JSON the lingua franca of the web. Personally, I don't believe JSON is easier to read than XML. On the other hand, JSON is a text format that is more compact and simpler than XML and can be edited by humans. Computers from a long list of software and hardware platforms can quickly analyze and understand language.

A JSON string is a plain text string. Any version of a Relational Database Management System (RDBMS), including SQL Server, can store a string regardless of its content layout. However, SQL Server 2016 is the first version of Microsoft's database that allows you to read existing table data as JSON, store table data as JSON, and (more importantly) query JSON strings as if the JSON content actually was a collection of individual columns.

A structured and comprehensive overview of the JSON functions in SQL Server 2016 can be found in the MSDN documentation at bit.ly/2llab1n. Also, an excellent abstract of JSON in SQL Server 2016 is available in the Simple Talk article at bit.ly/26rprwv. The article provides a more business-oriented view of JSON in SQL Server 2016 and generally provides a scenario-based perspective of using JSON data in a relational persistence layer.

JSON data in the persistence layer

Two verbs are key to understanding the purpose of JSON: transmit and serialize. JSON is the format in which you provide the state of a software entity so that it can be transmitted across process spaces with the assurance that it will be well understood by both parties. That sounds good, but this is a column about JSON in SQL Server and therefore also in the persistence layer. So let's start with the basic question: When would you store data in SQL Server as JSON?

A relational database table is defined for a fixed number of columns, and each column has its own data type, e.g. For example, variable or fixed length strings, dates, numbers, Boolean values, and so on. JSON is not a native data type. A SQL Server column that contains JSON data is a string-only column from the database perspective. You can write JSON data to a table column like a regular string, and you can do this in any version of SQL Server as well as any other RDBMS.

Where do the JSON strings come from that you ultimately store in a database? There are two main scenarios: First, these strings can come from a web service or some other type of external endpoint that transmits data (e.g. from a connected device or sensor). Second, JSON data can be a convenient way of grouping related information so that it exists as one data item. This usually happens when you are working with semi-structured data, such as: B. with data representing a business event to be stored in an event sourcing scenario or (much simpler) in a business context that is inherently event driven (e.g. real-time systems for domains like finance, commerce, valuation, monitoring, industrial automation and control, etc.). In all of these cases, their storage can be normalized to a structured form by serializing related variable-length and variable-format information into a single data item that fits into the string column of a relational table.

As mentioned earlier, the JSON content you want to persist can come from an external source or it can be generated via serialization from instances of C # objects:

You can also use Entity Framework (EF) to store JSON data in a column of a database table.

SQL Server 2016 goes one step further and lets you transform JSON data into table rows. This possibility can save a lot of work and a lot of CPU cycles in your code because you can now push the raw JSON text to the database without parseing it first in C # objects in the application code and then via EF or direct ADO. NET calls to analyze. The key to achieving this goal is the new OPENJSON function:

You can use this function to insert or update regular table rows from JSON plain text. You can use the WITH clause to map JSON properties to existing table columns.

The event sourcing scenario

In my December 2016 column, I described event sourcing as an increasingly used pattern for storing the historical state of the application (msdn.com/magazine/mt790196). Instead of saving the last known healthy state, Event Sourcing saves every single business event that changes state and recreates the current state by replaying the most recent events.

The key aspect of an event sourcing implementation is how effectively past events can be stored and retrieved. Each event is different and may use a different scheme depending on the type and information available. At the same time, using a separate (relational) store for each type of event is problematic because events occur asynchronously and possibly affect different entities and different segments of the state. If you store these in different tables, it can be time consuming to recreate the state due to cross-table JOINs. So storing events as objects is the most recommended option, and NoSQL memories do this job very efficiently. Is it possible to do event sourcing with a relational database instead?

Storing the event as JSON is an option that any version of SQL Server can do, but reading JSON effectively with a large number of events in memory may be intolerable. However, the native JSON features in SQL Server 2016 change the situation and use SQL Server in an event sourcing scenario becomes a realistic idea. But how would JSON be queried from a database table?

Query data from JSON content

For example, suppose you maintain one or more columns of JSON data in a canonical relational table. Columns with primitive data and columns with JSON data therefore exist in parallel. If the new features of SQL Server 2016 are not used, the JSON columns are treated as plain text fields and can only be queried with string and text statements from T-SQL such as LIKE, SUBSTRING and TRIM. For the demo I created a column called “Countries” (with a few tabular columns) and another column called “Serialized” that contains all of the rest of the data set serialized as JSON. illustration 1 shows this.


Figure 1: The sample database "Countries" with a JSON column

The JSON object serialized in the example table looks like this:

The following T-SQL query shows how only the countries with a population greater than 100 million are selected. The query mixes regular table columns and JSON properties:

The JSON_VALUE function takes the name of a JSON column (or a local variable set to a JSON string) and extracts the scalar value that follows the specified path. As in Figure 2 shown, the $ symbol refers to the root of the serialized JSON object.


Figure 2: Results of a JSON query

Since the JSON column is configured as an NVARCHAR-only column, you can use the ISJSON function to check that the content of the column is real JSON. The function returns a positive value if the content is JSON.

JSON_VALUE always returns a string of up to 4,000 bytes regardless of the selected property. If you expect a longer return value, you should use OPENJSON instead. However, for any size, you might want to consider doing a CAST operation to get a value of the correct type. Let us return to the previous example. For example, suppose you want to get the population of a country formatted with commas. (In general, this is not a good idea because formatting data in the presentation layer gives your code a lot more flexibility.) The SQL FORMAT function takes a number as input and returns an error if you pass the direct JSON value. For this to work, you need to resort to an explicit CAST operation:

JSON_VALUE can only return a single scalar value. If you have an array of a nested object to extract, you will need to resort to the JSON_QUERY function.

How effective is querying JSON data? Let's run some tests.

Indexing JSON content in SQL Server 2016

It seems obvious, but querying the entire JSON string from the database and then parsing the string in memory using a dedicated library like Newtonsoft JSON always works, but is not an effective practice in all circumstances. Effectiveness depends largely on the number of records in the database and the amount of time it really takes to get the data you need in the format you want. For a query that your application runs occasionally, in-memory processing of JSON data might be an option. In general, however, querying via JSON-mapped functions and using SQL to analyze the results internally results in slightly faster code. The difference becomes even greater when you add an index on JSON data.

However, you shouldn't create the index on the JSON column because the JSON value would be indexed as a single string. You are unlikely to query all or any part of the JSON string. It is more realistic that you query the value of a specific property in the serialized JSON object. A more effective approach is to create one or more computed columns based on the value of one or more JSON properties and then index those columns. Here is an example in T-SQL:

Again, be aware that NVARCHAR is returned by JSON_VALUE. If you do not add a CAST operation, the index is created on text.

Interestingly, JSON parsing is faster than deserializing some special types like XML and spatial types. More information is available at bit.ly/2kthrrC. In summary, JSON parsing is at least better than getting properties of other types.

JSON and EF

In general, it should be noted that JSON support in SQL Server 2016 is mainly provided via the T-SQL syntax, because tools are currently only available to a limited extent. EF in particular does not currently provide any options for querying JSON data. Exceptions are the SqlQuery method in EF6 and FromSql in EF Core. However, this does not mean that you cannot serialize complex properties of C # classes (such as arrays) into JSON columns. You can find an excellent tutorial on EF Core at bit.ly/2kVEsam.

Summary

SQL Server 2016 introduces some native JSON functions so that stored JSON data can be more effectively queried as a canonical rowset. Most of the time, this happens when the JSON data is the serialized version of semi-structured aggregate data. Indexes built from calculated values ​​that reflect that value of one or more JSON properties definitely have a positive impact on performance.

JSON data is stored as plain text and is not considered a special type (e.g. XML and spatial). However, this is exactly what enables you to use JSON columns directly in any SQL Server object. This does not apply to other complex types such as XML, CLR, and Spatial that are still on the waiting list.

In this column, I focused on the JSON-to-rowset scenario. However, SQL Server 2016 also fully supports the rowset-to-JSON query scenario if you write a regular T-SQL query and then map the results to JSON objects using the FOR JSON clause. For more information on this feature, see bit.ly/2fTKly7.


Dino Espositois the author of "Microsoft .NET: Architecting Applications for the Enterprise" (Microsoft Press, 2014) and "Modern Web Applications with ASP.NET" (Microsoft Press, 2016). Esposito is the Technical Evangelist for the .NET and Android platforms at JetBrains and is a frequent speaker at industry events around the world. At software2cents.wordpress.com and on Twitter at @despos he lets us know which software vision he is pursuing.

Thanks to the following Microsoft technical expert for reviewing this article: Jovan Popovic