Thursday, April 27, 2023

Using JSLTTransformJSON (an alternative to JoltTransformJSON)

As of Apache NiFi 1.19.0, there is a processor called JSLTTransformJSON. This is similar in function to the JoltTransformJSON processor, but it allows you to use JSLT as the Domain-Specific Language (DSL) to transform one JSON to another. This processor was added because some users found JOLT too complex to learn, as it uses a "tree-walking" pattern to change the JSON structure versus a more programmatic DSL (From the docs, JSLT is based on jq, XPath, and XQuery). For an example of using JSLT, check out the demo playground which has a ready-made example.

For this post, I'll show a couple of examples including this following one that shows the Inception example from the JOLT playground, just as a comparison between the JOLT and JSLT expressions:

Input:

{
  "rating": {
    "primary": {
      "value": 3
    },
    "quality": {
      "value": 3
    }
  }
}

JOLT Transform:

[
  {
    "operation": "shift",
    "spec": {
      "rating": {
        "primary": {
          "value": "Rating"
        },
        "*": {
          "value": "SecondaryRatings.&1.Value",
          "$": "SecondaryRatings.&1.Id"
        }
      }
    }
  },
  {
    "operation": "default",
    "spec": {
      "Range": 5,
      "SecondaryRatings": {
        "*": {
          "Range": 5
        }
      }
    }
  }
]

Equivalent JSLT Transform:

{
 "Rating" : .rating.primary.value,
 "SecondaryRatings" : 
  [for (.rating) {
    .key : { 
        "Id": .key,
        "Value" : .value.value,
        "Range": 5
     }
   }
   if (.key != "primary")
  ],
  "Range": 5
}

These both kind of "walk the structure", but in my opinion the JSLT is more succinct and more readable for at least these reasons:

  • There is no need for a "default" spec like JOLT
  • There are fewer levels of JSON objects in the specification
  • You don't have to walk the structure one level at a time because you can refer to lower levels using the dot-notation  
  • You don't have to "count the levels" to use the & notation to walk back up the tree

Learning JSLT may not be easier than JOLT but you may find it easier if you are familiar with the tools JSLT is based on. At the end of the day they are both designed to modify the structure of input JSON, and just approach the DSL in different ways. And NiFi offers the choice between them :)

As always I welcome all comments, suggestions, and questions. Cheers!



Transform JSON String Field Into Record Object in NiFi

 One question that has come up often in the NiFi community is: "I have a field in my record(s) that is a string that actually contains JSON. Can I change that field to have the JSON object as the value?" It turns out the answer is yes! It is possible via ScriptedTransformRecord, and there are a couple of cool techniques I will show in this post where you can work with any format, not just JSON.

To start, I have a flow with a GenerateFlowFile (for test input) and a ScriptedTransformRecord for which its success relationship is sent to a funnel (just for illustration):


For GenerateFlowFile, I set the content to some test input with a field "myjson" which is a string containing JSON:

[
  {
    "id": 1,
    "myjson": "{\"a\":\"hello\",\"b\": 100}"
  },
  {
    "id": 2,
    "myjson": "{\"a\":\"world\",\"b\": 200}"
  }
]


You can see the "myjson" field is a string with a JSON object having two fields "a" and "b". With ScriptedTransformRecord I will need to look for that record field by name and parse it into a JSON object, which is really easy with Groovy. This is the entire script, with further explanation below:

import org.apache.nifi.serialization.record.*
import org.apache.nifi.serialization.record.util.*
import java.nio.charset.*
import groovy.json.*

def recordMap = record.toMap()
def derivedValues = new HashMap(recordMap)
def slurper = new JsonSlurper()
recordMap.each {k,v -> 
    if('myjson'.equals(k)) {
derivedValues.put(k, slurper.parseText(v))
    }
}
def inferredSchema = DataTypeUtils.inferSchema(derivedValues, null, StandardCharsets.UTF_8)
return new MapRecord(inferredSchema, derivedValues)

The first thing the script does (after importing the necessary packages) is to get the existing record fields as a java Map. Then a copy of the map is made so we can change the value of the "myjson" field. The record map is iterated over, giving each key/value pair where the key is the record field name and the value is the record field value. I only want to change the "myjson" field, and I replace the String value with the parsed Java Object. This is possible because the maps are of type <String, Object>.

However, I don't know the new schema of the overall record because "myjson" is no longer a String but a sub-record. I used DataTypeUtils (with a null record name as it's the root record) and its inferSchema() method to get the schema of the new output record, then I return a new record (instead of the original) using that schema and the updated map. Using a JsonRecordSetWriter in ScriptedTransformRecord gives the following output for the above input FlowFile:

[ {
  "id" : 1,
  "myjson" : {
    "a" : "hello",
    "b" : 100
  }
}, {
  "id" : 2,
  "myjson" : {
    "a" : "world",
    "b" : 200
  }
} ]

The field parsing technique (JsonSlurper in this case) can be extended to other things as well, basically any string that can be converted into a Map. Groovy has an XmlSlurper so the content can be XML and not JSON for example.

The schema inference technique can be used in many cases, not just for replacing a single field. It only takes the root-level Map, name of null, and a character set (UTF-8 in this post) and produces a new schema for the outgoing record. You can even produce multiple output records for a single input record but you'll want them both to have the same schema so the entire overall FlowFile has the same schema for all records.

As always, I welcome all comments, questions, and suggestions. Cheers!