Thursday, June 9, 2016

Testing ExecuteScript processor scripts

I've been getting lots of questions about how to develop/debug scripts that go into the ExecuteScript processor in NiFi. One way to do this is to add a unit test to the nifi-scripting-processors submodule, and set the Script File property to your test script. However for this you basically need the full NiFi source.

To make things easier, I basically took a pared-down copy of the ExecuteScript processor (and its helper classes), added nifi-mock as a dependency, and slapped a command-line interface on it. This way with a single JAR you can run your script inside a dummy flow containing ExecuteScript.

The result is version 1.1.1 of the NiFi Script Tester utility, on GitHub and Bintray.

Basically it runs your script as a little unit test, and you can pipe stdin to become a flowfile, or point it at a directory and it will send every file as a flowfile, stuff like that. If your script doesn't need a flowfile, you can run it without specifying an input dir or piping in stdin, it will run once even without input.  The usage is as follows:
Usage: java -jar nifi-script-tester-<version>-all.jar [options] <script file>
 Where options may include:
   -success            Output information about flow files that were transferred to the success relationship. Defaults to true
   -failure            Output information about flow files that were transferred to the failure relationship. Defaults to false
   -no-success         Do not output information about flow files that were transferred to the success relationship. Defaults to false
   -content            Output flow file contents. Defaults to false
   -attrs              Output flow file attributes. Defaults to false
   -all-rels           Output information about flow files that were transferred to any relationship. Defaults to false
   -all                Output content, attributes, etc. about flow files that were transferred to any relationship. Defaults to false
   -input=<directory>  Send each file in the specified directory as a flow file to the script
   -modules=<paths>    Comma-separated list of paths (files or directories) containing script modules/JARs

As a basic example, let's say I am in the nifi-script-tester Git repo, and I've built using "gradle shadowJar" so my JAR is in build/libs. I can run one of the basic unit tests like so:
java -jar build/libs/nifi-script-tester-1.1.1-all.jar src/test/resources/test_basic.js
Which gives the following output:
2016-06-09 14:20:49,787 INFO  [pool-1-thread-1]                nifi.script.ExecuteScript - ExecuteScript[id=9a507773-86ae-4c33-957f-1c0270302a0e] hello
Flow Files transferred to success: 0
This shows minimal output, just the logging from the framework and the default summary statistic of "Flow Files transferred to success".  Instead let's try a more comprehensive example, where I pipe in a JSON file, run my Groovy script from the JSON-to-JSON conversion post, and display all the attributes and contents and statistics from the run:
cat src/test/resources/input_files/jolt.json | java -jar build/libs/nifi-script-tester-1.1.1-all.jar -all src/test/resources/test_json2json.groovy
This gives the following (much more verbose) output:
Flow file FlowFile[0,14636536169108_translated.json,283B]
---------------------------------------------------------
FlowFile Attributes
Key: 'entryDate'
	Value: 'Thu Jun 09 14:24:50 EDT 2016'
Key: 'lineageStartDate'
	Value: 'Thu Jun 09 14:24:50 EDT 2016'
Key: 'fileSize'
	Value: '283'
FlowFile Attribute Map Content
Key: 'path'
	Value: 'target'
Key: 'filename'
	Value: '14636536169108_translated.json'
Key: 'uuid'
	Value: '84d8d290-fbf9-4d57-aaf7-fd050da40d9f'
---------------------------------------------------------
{
    "Range": 5,
    "Rating": "3",
    "SecondaryRatings": {
        "metric": {
            "Id": "metric",
            "Range": 5,
            "Value": 6
        },
        "quality": {
            "Id": "quality",
            "Range": 5,
            "Value": 3
        }
    }
}

Flow Files transferred to success: 1

Flow Files transferred to failure: 0
There are options to suppress or select various things such as relationships, attributes, flowfile contents, etc.  As a final example let's look at the Hazelcast example from my previous post, to see how to add paths (files and directories) to the script tester, in the same way you'd set the Module Path property in the ExecuteScript processor:

java -jar ~/git/nifi-script-tester/build/libs/nifi-script-tester-1.1.1-all.jar -attrs -modules=/Users/mburgess/Downloads/hazelcast-3.6/lib hazelcast.groovy
And the output:

Jun 09, 2016 2:31:01 PM com.hazelcast.core.LifecycleService
INFO: HazelcastClient[hz.client_0_dev][3.6] is STARTING
Jun 09, 2016 2:31:01 PM com.hazelcast.core.LifecycleService
INFO: HazelcastClient[hz.client_0_dev][3.6] is STARTED
Jun 09, 2016 2:31:01 PM com.hazelcast.core.LifecycleService
INFO: HazelcastClient[hz.client_0_dev][3.6] is CLIENT_CONNECTED
Jun 09, 2016 2:31:01 PM com.hazelcast.client.spi.impl.ClientMembershipListener
INFO:

Members [1] {
	Member [172.17.0.2]:5701
}

Jun 09, 2016 2:31:01 PM com.hazelcast.core.LifecycleService
INFO: HazelcastClient[hz.client_0_dev][3.6] is SHUTTING_DOWN
Jun 09, 2016 2:31:01 PM com.hazelcast.core.LifecycleService
INFO: HazelcastClient[hz.client_0_dev][3.6] is SHUTDOWN
Flow file FlowFile[0,15008187914245.mockFlowFile,0B]
---------------------------------------------------------
FlowFile Attributes
Key: 'entryDate'
	Value: 'Thu Jun 09 14:31:01 EDT 2016'
Key: 'lineageStartDate'
	Value: 'Thu Jun 09 14:31:01 EDT 2016'
Key: 'fileSize'
	Value: '0'
FlowFile Attribute Map Content
Key: 'path'
	Value: 'target'
Key: 'hazelcast.customers.nifi'
	Value: '[name:Apache NiFi, email:nifi@apache.org, blog:nifi.apache.org]'
Key: 'filename'
	Value: '15008187914245.mockFlowFile'
Key: 'hazelcast.customers.mattyb149'
	Value: '[name:Matt Burgess, email:mattyb149@gmail.com, blog:funnifi.blogspot.com]'
Key: 'uuid'
	Value: '0a7c788e-0aef-40e4-b9a6-426f877dbfbe'
---------------------------------------------------------

Flow Files transferred to success: 1
This script would not work without the Hazelcast JARs, so this shows how the "-modules" option is used to add them to the classpath for testing.

The nifi-script-tester only supports Javascript and Groovy at the moment; including Jython (for example) would increase the JAR's size by 500% :/ Right now its only 8.6 MB, so a little big but not too bad.

Anyway hope this helps, please let me know how/if it works for you!

Cheers,
Matt

6 comments:

  1. This is awesome. Thank you for saving me from hours and hours of debugging a hugeass JS script I was writing into Nifi.

    ReplyDelete
  2. Hello Matt,

    great stuff. Just one question - is it possible to set up input flowfile attributes somehow?

    thx,
    Leszek

    ReplyDelete
  3. I don't understand why I am getting the following error on this script: cannot create an instance from the abstract interface org.apache.nifi.processor.io.StreamCallback
    flowFile = session.get();

    if (flowFile != null) {



    var StreamCallback = Java.type("org.apache.nifi.processor.io.StreamCallback");

    var IOUtils = Java.type("org.apache.commons.io.IOUtils");

    var StandardCharsets = Java.type("java.nio.charset.StandardCharsets");

    var transformed_message = {};

    var error = false;

    var line = "ops_Powertrack";





    // Get attributes

    flowFile = session.write(flowFile, new StreamCallback(function (inputStream, outputStream) {

    // Read input FlowFile content

    var content = IOUtils.toString(inputStream, StandardCharsets.UTF_8); // message or content

    var message_content = {};

    try {

    message_content = JSON.parse(content);

    transformed_message.postID = (((message_content || {}).postID || "null"));

    transformed_message.contentType = (((message_content || {}).contentType || "null"));



    transformed_message.published = (((message_content || {}).published || "null"));

    transformed_message.crawled = (((message_content || {}).crawled || "null"));

    transformed_message.providerID = (((message_content || {}).providerID || "null"));



    line = line + " " + "postID=" + transformed_message.postID + ","

    + "contentType=" + transformed_message.contentType + ","

    + "published=" + transformed_message.published + ","

    + "crawled=" + transformed_message.crawled + ","

    + "providerID=" + transformed_message.providerID + ","

    + " value=" + "1" + " "

    + time * 1000000 + "\n";





    // Write output content

    if (transformed_message) {

    outputStream.write(line.getBytes(StandardCharsets.UTF_8));

    }

    } catch (e) {

    error = true;

    outputStream.write(content.getBytes(StandardCharsets.UTF_8));

    }

    }));





    if (error) {

    session.transfer(flowFile, REL_FAILURE)

    } else {

    session.transfer(flowFile, REL_SUCCESS)

    }

    }

    ReplyDelete
  4. Really a great tool, good job man, thank a lot

    ReplyDelete
  5. Any chance this will be updated for latest version of nifi 1.9.x

    ReplyDelete
  6. jfrog (gag) has taken over bintray. Your link to the jar is dead.

    ReplyDelete