Mark Nadelson
Mark Nadelson has been a professional Software Engineer for the past 28 years. During his career he has worked in a number of industries including Telecom, Internet, and Finance. He has written two books and numerous articles for a variety of software development publications. He is a self proclaimed nerd and wears that title proudly always looking to expand his knowledge. When he is not needing out he is enjoying time with his kids and over weight bull dog.

Most software developers who have coded systems that require inter-process communication (whether for asynchronous message-driven systems or for synchronous request/reply services) use the beloved JavaScript Object Notation (JSON) as the message payload format of choice. Why choose JSON? Most likely it was chosen because it is human-readable with attribute-value pairs and easily serialized data types such as arrays, lists, and maps. Other reasons include the fact that it uses a language-independent data format and many languages include prewritten libraries to generate and parse JSON-formatted data.

These are all great reasons to choose JSON, and they are why I have chosen JSON as the message format for many of my own applications.

When working with JSON there are times when you need to search a message’s content. This search may be for a host of reasons including support issues, debugging problems, replaying messages, or joining the content of the JSON document with other data sources within your environment.

To search multiple JSON documents, you can store them within a document database and then search for strings within the document. The problem with this approach is that JSON is nested, so you may come up with search results that have the text you are looking for, but not within the hierarchy of the document in which you were expecting to find it.

If your database supports XPATH and/or XQUERY, you can use these tools to search your JSON documents. XPATH and XQUERY were originally meant for XML documents, but can be used for JSON as well. The syntax for XPATH and XQUERY is a bit cryptic and non-intuitive. In my experience, there is a steep learning curve and it requires lots of practice to get it right.

A final way to store and search JSON data is by storing the fields in a relational database. SQL is a great way to search and filter exactly the data that you are requesting. Unfortunately, to get JSON documents in a relational database, you must transform the JSON document into fields, values, and sub-tables (if your JSON is nested). Depending on the complexity of your JSON, this could be a daunting task. What makes matters even more complicated is that if the JSON document evolves, you will also have to evolve your JSON to relational database transformation code.

The ideal is to be able to store the JSON document directly within a relational database table, then apply searches and filters against the document in a very intuitive and simple manner. It would be even better if you could then join to other relational tables to create complex queries.

PostgreSQL is one such database that gives you this capability. The remainder of this article describes how to use PostgreSQL’s JSON features as well as a real-world example that stores and searches stored JSON data for reporting purposes.

Overview of PostgreSQL JSON features

PostgreSQL version 9.2 and later supports a native JSON data type. There are many functions and operators for manipulating JSON data within the database. I will cover some of the features within this article, but this is by no means exhaustive. Go to https://www.postgresql.org/docs/current/functions-json.html for the full reference guide.

For this exercise, create a table called published_messages. You can persist all messages you publish in this table for the purpose of debugging, analysis, and possible replay.

CREATE TABLE published_messages (
  ID SERIAL NOT NULL PRIMARY KEY,
  message_key VARCHAR(50) NOT NULL,
  message_value  JSON NOT NULL
)

This table contains an ID column defining order, a message_key column that has the messages’ key values, and a message_value column that is the actual JSON message value.

To insert data, you need valid formatted JSON to go into the message_value column. The messages passed for these examples are stock trades that have been filled by the exchange.

INSERT into published_messages(message_key, message_value)
VALUES ('FillID1', '{ "Symbol": "IBM", "SharesFilled": 100, "Price": 50.5, "OriginalOrder": {"Quantity": 1000, "TradePrice": 50.4}}')

If your JSON is not formatted property, PostgreSQL will fail the insert and notify you of the error.

Now that you have some stored data, you can run queries by selecting the entire JSON document or pieces of the document and filtering on values within the document.

Selecting the entire JSON document is as simple as running:

SELECT message_value from published_messages

PostgreSQL provides two native operators for querying the internals of JSON documents: -> and ->>. The -> operator returns the value as a JSON object and the ->> operator returns the value as text.

If you want to get all the symbols, fills, and prices, you can run the following query which returns the values as JSON objects:

SELECT message_value -> 'Symbol' AS symbol,
       message_value -> 'SharesFilled' AS amount_filled,
       message_value -> 'Price' AS filled_price
FROM published_messages

To run calculations upon the values, you need to CAST the returned values to their native types. The following calculates the dollar value of the shares filled by the exchange:

SELECT message_value -> 'Symbol' AS symbol,
       CAST(message_value ->> 'SharesFilled' AS INTEGER) *
       CAST(message_value ->> 'Price' AS numeric)
FROM published_messages

Let’s say you want to get the information from the nested OriginalOrder field. You simply change the fields together in your SELECT:

SELECT message_value -> 'symbol' AS symbol,
       message_value -> 'OriginalOrder' -> 'Quantity' AS OriginalQuantity,
       message_value -> 'OriginalOrder' -> 'TradePrice' AS TradePrice
FROM published_messages

Finally, you can use PostgreSQL’s JSON selection features to filter your query. To get all messages for symbol = "IBM", apply your JSON selection to the WHERE clause:

SELECT * FROM published_messages
WHERE message_value ->> 'symbol' = 'IBM'

You use the ->> operator to get the text version of symbol and match it against "IBM".

Stock order calculator

This use case builds upon the JSON example in the last section. You are building an application that captures the stock orders filled from the stock exchange. The application receives messages and stores them in your PostgreSQL database. The application also has a couple of RESTful APIs that report on the current status of stock orders. The user can request a report that contains all stock symbols and their associated order details or a single symbol and its order details. You will be able to do all these actions using a single PostgreSQL table, making your application quite simple.

I have written a simple Spring Boot application in Java, and the full source code can be found at https://github.com/mnadelson/PostgressJson. I will present relevant code snippets during the discussion of the application.

There are two classes responsible for sending the stock fill JSON messages and receiving and processing the fill messages. FillOutgoingProcessor simulates the sending of stock JSON messages via a TCP socket. It initializes with five stock orders with a quantity of 10,000 shares. It sends a partially filled order over the socket connection every 5 seconds until the order is completely filled.

@Scheduled(initialDelay = 5000, fixedDelay = 5000)
public void sendOutgoingFills() {
  try {
    Trade trade = tradeList.get(random.nextInt(5));
    if (!trade.isFilled()) {
      int fillQuantity = random.nextInt(1000);
      fillQuantity = trade.applyFill(fillQuantity);
      double price = Math.round(random.nextDouble() * 1000.0) / 100.0;
      String json = trade.getJson(fillQuantity, price);
        logger.info("Sending Outgoing Fill: " + json);
      bufferedWriter.write(json + "\n");
      bufferedWriter.flush();
    }
  }
  catch(Exception ex) {
    logger.error("Error Sending Fill", ex);
  }
}

FillIncomingProcessor reads the stock fill JSON message from the socket and persists it in a PostgreSQL table called StockFills. StockFills contains two columns: an ID (identity) column to track order of inserts and a fill column that stores the JSON. The CREATE statement for the table is as follows:

CREATE TABLE StockFills (
  ID SERIAL NOT NULL PRIMARY KEY, 
  Fill JSON NOT NULL
)

After receiving the stock fill message, FillIncomingProcessor calls the persistFill method, passing the JSON. persistFill stores the fill in the database table using a JDBC connection returned from the postgresConnection method.

private void persistFill(String json) throws SQLException {
  PreparedStatement preparedStatement = postgresConnection().prepareStatement(
    "insert into StockFills(fill) values (?::JSON)");
  preparedStatement.setObject(1, json);
  preparedStatement.executeUpdate();
  preparedStatement.close();
}

Stock order status reports are generated within the StockOrderRestProcessor. There are two RESTful calls: /report and /report/<stock symbol>. The /report call returns a report with all stock orders, and the /report/<stock symbol> call returns a report with the single symbol requested. The HTML report is generated within the report method. The report method is overloaded to take or not take a stock symbol.

The interesting piece of this is the actual PostgreSQL SQLquery:

String sql = "SELECT \n" +
  "  fill->>'Symbol' AS Symbol,\n" +
  "  SUM(CAST(fill->>'SharesFilled' AS INTEGER)) AS SharesFilled,\n" +
  "  AVG(CAST(fill->'OriginalOrder'->>'Quantity' AS INTEGER))" +
  "  AS OriginalQuantity,\n" +
  "  SUM(CAST(fill->>'SharesFilled' AS INTEGER) * CAST(fill->>'Price' " +
  "  AS NUMERIC)) /\n" +
  "  SUM(CAST(fill->>'SharesFilled' AS INTEGER)) AS AverageFilledPrice,\n" +
  "  AVG(CAST(fill->'OriginalOrder'->>'TradePrice' AS NUMERIC)) " +
  "  AS OriginalTradePrice\n" +
  "  FROM stockfills";

It extracts a piece of the JSON as shown when getting the symbol fill->>’Symbol’ as Symbol. It also does various aggregations such as computing the total shares filled thus far for a symbol:

SUM(CAST(fill->>'SharesFilled' AS INTEGER)) AS SharesFilled

More complicated calculations, for example computing the weighted average price of shared filled, are done as follows:

SUM(CAST(fill->>'SharesFilled' AS INTEGER) * /
  CAST(fill->>'Price' AS NUMERIC)) /
  SUM(CAST(fill->>'SharesFilled' AS INTEGER)) AS AverageFilledPrice

The SQL results are aggregated into an HTML report and returned to the user.

Symbol Shares Filled Original Quantity Average Filled Price Original Trade Price
YHOO 2313 10000 8.20 3.29
MSFT 4253 10000 5.49 7.18
GOOGL 2669 10000 5.86 2.80
IBM 2223 10000 2.98 7.30
F 4923 10000 6.09 2.16

Conclusion

Using JSON as a message format provides a rich and hierarchical means of representing your data. PostgreSQL’s JSON data storage features provide an easy way of storing, extracting, aggregating, and filtering JSON documents without any transformation.

There is no need to write an ETL process to take the JSON document and parse it into fields and sub-tables for the means of storing it in a relational database. You are also spared the task of constantly having to update the ETL job when the JSON message evolves and changes. PostgreSQL’s rich JSON features allow you to work directly with the JSON document and perform the same operations as if it were in one or more relational tables.

How to work with us

  • Contact us to set up a call.
  • We will analyze your needs and recommend a content contract solution.
  • Sign on with ContentLab.
  • We deliver topic-curated, deeply technical content to you.

To get started, complete the form to the right to schedule a call with us.

Send this to a friend