I'll use the schema as it was presented to me on the mailing list:
{ "type": "object", "required": ["name", "tags", "timestamp", "fields"], "properties": { "name": {"type": "string"}, "timestamp": {"type": "integer"}, "tags": {"type": "object", "items": {"type": "string"}}, "fields": { "type": "object"} } }This shows that the incoming flow file should contain a JSON object, that it needs to have certain fields (the "required" values), and the types of the values it may/must contain (the "properties" entries). For this script I'll hard-code this schema, but I'll talk a bit at the end about how this can be done dynamically for a better user experience.
Since the schema itself is JSON, we use org.json.JSONObject and such to read in the schema. Then we use org.everit.json.schema.SchemaLoader to load in a Schema. We can read in the flow file with session.read, passing in a closure cast to InputStreamCallback, see my previous post for details, and call schema.validate(). If the JSON is not valid, validate() will throw a ValidationExecption. If it does, I set a "valid" variable to false, then route to SUCCESS or FAILURE depending on whether the incoming flow file was validated against the schema. The original script is as follows:
import org.everit.json.schema.Schema import org.everit.json.schema.loader.SchemaLoader import org.json.JSONObject import org.json.JSONTokener flowFile = session.get() if(!flowFile) return jsonSchema = """ { "type": "object", "required": ["name", "tags", "timestamp", "fields"], "properties": { "name": {"type": "string"}, "timestamp": {"type": "integer"}, "tags": {"type": "object", "items": {"type": "string"}}, "fields": { "type": "object"} } } """ boolean valid = true session.read(flowFile, { inputStream -> jsonInput = org.apache.commons.io.IOUtils.toString(inputStream, java.nio.charset.StandardCharsets.UTF_8) JSONObject rawSchema = new JSONObject(new JSONTokener(new ByteArrayInputStream(jsonSchema.bytes))) Schema schema = SchemaLoader.load(rawSchema) try { schema.validate(new JSONObject(jsonInput)) } catch(ve) { log.error("Doesn't adhere to schema", ve) valid = false } } as InputStreamCallback) session.transfer(flowFile, valid ? REL_SUCCESS : REL_FAILURE)
This is a pretty basic script, there are things we could do to improve the capability:
- Move the schema load out of the session.read() method, since it doesn't require the input
- Allow the user to specify the schema via a dynamic property
- Do better exception handling and error message reporting
A worthwhile improvement (that would include all of these) is to turn the script into a proper Processor and put it in an InvokeScriptedProcessor. That way you could have a custom set of relationships, properties, to make it easy for the user to configure and use.
Of course, the best solution is probably to implement it in Java and contribute it to Apache NiFi under the Jira case NIFI-1893 :)
Cheers!
thanks for the script.
ReplyDeleteforgot to mention
"need to download the two JAR dependencies ([2] and
[3]) and add them to your Module Directory property."
[2] http://mvnrepository.com/artifact/org.everit.json/org.everit.json.schema/1.3.0
[3] http://mvnrepository.com/artifact/org.json/json/20160212
Oops, yep I should have added that and/or the reference to the @Grab line from the previous post. Good catch, thanks!
DeleteThere are lots of information about latest software analyzing huge amounts of unstructured data in a distributed computing environment.This information seems to be more unique and interesting.
ReplyDeleteThanks for sharing. PHP Training in Chennai | Certification | Online Training Course | Machine Learning Training in Chennai | Certification | Online Training Course | iOT Training in Chennai | Certification | Online Training Course | Blockchain Training in Chennai | Certification | Online Training Course | Open Stack Training in Chennai |
Certification | Online Training Course
Am really impressed about this blog because this blog is very easy to learn and understand clearly.This blog is very useful for the college students and researchers to take a good notes in good manner,I gained many unknown information.
ReplyDeleteData Science Training In Chennai
Data Science Online Training In Chennai
Data Science Training In Bangalore
Data Science Training In Hyderabad
Data Science Training In Coimbatore
Data Science Training
Data Science Online Training
you have executed an uproarious undertaking upon this text. Its completely change and very subjective. you have even figured out how to make it discernible and simple to make a get accord of into. you have a couple of definite composing dexterity. much appreciated likewise a lot. Activation Key For Movavi
ReplyDelete