Tuesday, May 24, 2016

Using Groovy @Grab with ExecuteScript

As we've seen in a previous post, it is possible to add dependencies/modules to your ExecuteScript classpath by using the Module Directory property in the processor configuration dialog.  However many third-party libraries have (sometimes lots of) transitive dependencies, and downloading all of them and/or setting up your Module Directory can become a pain.

The Groovy world has a solution for such a thing, using Groovy Grape and specifically the @Grab annotation. This instructs Groovy to download the artifact(s) and their dependencies into the Grape cache. It works much like Maven or Ivy, and in fact it is based on Ivy. For this reason, ExecuteScript (when using the Groovy engine) needs the Apache Ivy JAR.

You may be asking why the Ivy JAR is not included with the scripting NAR that has the InvokeScriptProcessor and ExecuteScript processors in it, or if we can use the Module Directory to point at the Ivy JAR and be on our way.  I haven't been able to verify in the Groovy code, but by experimentation it appears that Grape uses the application classloader, and not the current thread context's class loader, to find the Ivy classes it needs. This means the Ivy JAR needs to be in the "original" NiFi classpath (i.e. in the lib/ folder of NiFi) rather than in the NAR or specified in the Module Directory property.

You can download the Apache Ivy JAR here (I tested with Ivy 2.4.0), and place it in your NiFi distribution under lib/, then restart NiFi.

Now we can get to the script part :)  Let's say we want to write a Groovy script for ExecuteScript to validate an incoming flow file in JSON format against a JSON Schema. There is a good JSON Schema Validator library for this on Github. In this case, the library only has a couple of dependencies, and the only one not available to ExecuteScript is org.json:json, so it's not a real pain to use Module Directory for this, but I wanted to keep it simple to show the general idea.  We use @Grab to get this dependency (and its transitive dependencies), then import the classes we will end up using to do the JSON Schema validation:
@Grab(group='org.everit.json', module='org.everit.json.schema', version='1.3.0')
import org.everit.json.schema.Schema
import org.everit.json.schema.loader.SchemaLoader
import org.json.JSONObject
import org.json.JSONTokener

// Rest of code here
If you don't already have all the artifacts in your Grapes cache, then the very first time this script runs, it will be much slower as it has to download each artifact and its transitive dependencies. Of course, this means your NiFi instance has to be able to get out to the internet, and Grape will need to be configured properly for your machine. In many cases the default settings are fine, but check out the Grape documentation as well as the default grapeConfig.xml settings (note that this is an Ivysettings file). NOTE: The default location for the Grapes cache is under the user's home directory in ~/.groovy/grapes. If you are running NiFi as a certain user, then you will want to ensure the user has permission to write to that directory.

An alternative to the possibly-long-download solution is to pre-install the grapes on the machine manually. If Groovy is installed on the machine itself (versus the one supplied with ExecuteScript) you can do the following (see the doc for more details):
grape install <groupid> <artifactid> [<version>]
to install the grapes into the cache. Then ExecuteScript with @Grab will be faster out of the gate.

Hopefully this post has offered a way to make prototyping new capabilities with ExecuteScript and Groovy a little faster and easier.

Cheers!

No comments:

Post a Comment