From the course: Data Management with Apache NiFi

Configure processors to convert data formats - Apache NiFi Tutorial

From the course: Data Management with Apache NiFi

Configure processors to convert data formats

- [Instructor] Before we set up the next processors in our data flow, I should tell you what they'll be doing. Now we have our SQL query where we group and aggregate the data that we want finally to be written out as a CSV file on our local machine. Now the database query that we run using ExecuteSQL will produce data in the Avro format. Now there is no direct way to get from the Avro format to the CSV format. We have to first convert Avro to JSON, and then JSON to CSV. And it's to do exactly this that we'll set up the next processor. This processor is going to be the AvroToJSON converter. Search for "convert," select "ConvertAvroToJSON," and place that on your canvas. Let's double click and configure its properties. I'll leave it to you to explore what properties we can configure on this processor. But for this demo, we can accept the default values for all of these properties. The only thing I'll do here is set up the "terminate" relationship in case of failure. This is all the configuration that ConvertAvroToJSON needs. Connect the output of ExecuteSQL to ConvertAvroToJSON. So in the case of success, our Avro output of the SQL query will be converted to a JSON format. And once it's in the JSON format, we can convert it to a CSV format using another processor. In the processor dialogue, search for and select "ConvertRecord." This converts record from one data format to another. You need to configure the Record Reader and Record Writer controller services, and it can convert from any format to any format. Well, we want to convert from JSON to CSV. Double click on ConvertRecord and set up a configuration to do exactly this. Within the "Properties" tab, you'll find an option for Record Reader and Record Writer. Record Reader is what you use to specify what format the records will be coming in as. We know that the records will come in in the JSON format. We cannot use our already existing CSVReader controller service. We'll have to create a new service to read JSON records. Select this dropdown and choose the JsonTreeReader option here. Remember, we've converted the Avro output of ExecuteSQL to the JSON format, and it is this JSON that we want to read and convert to a CSV format. Click on the "Create" option, and you can click on this little arrow to configure the JsonTreeReader controller service. So let's save the existing processor properties and head over to configure the JsonTreeReader service. Click on the little gear icon. Now, I did say we have to configure this, but in fact, we can simply accept all of the default values for this JsonTreeReader converter. Here are all of the properties that you could configure. We won't change any of these, we'll simply accept all of them. Now, that's it for the JsonTreeReader controller service. We can head back to our data flow and continue with our configuration of the ConvertRecord processor. We had configured the Record Reader. Now it's time for us to configure the Record Writer. What format do we want to convert to? We want our final output to be in the CSV format, which is why we need to create a new service to perform this operation. We previously used the CSV Reader controller service. I'll now configure a CSV Writer controller service. Choose the CSVRecordSetWriter from the "Add Controller Service" dropdown. And once again, you'll be prompted to go ahead and configure the settings for this controller service. Click on the little arrow, make sure you save all of the changes you've made to ConvertRecord. Now click on the little gear icon to configure the CSVRecordSetWriter. And the cool thing here is you do not have to change any of the default properties. There are many properties here that you can configure to get your data in exactly the format you want it to be in. The properties are self-explanatory, and they have great help text. I'll leave it to you to explore those at leisure. But for this demo, we can just stick with the defaults. At this point, we've set up two controller services for the ConvertRecord processor, the Record Reader and Record Writer converter services. There are no other changes we need to make in the properties, but we do need to update the relationships. Make sure you terminate on failure. That completes our configuration of ConvertRecord. So now our JSON records should be present in the CSV format. We need to set up a connection for this to happen. Connect the success relationship of ConvertAvroToJSON to ConvertRecord. With this wired up, we are close to completing our data flow, but we are not there yet. There are a few more processors to set up.

Contents