Data Inject To Database Through Ni-Fi
Summary:
Prerequisites:
1. DBCPConnectionPool:
2.GetHTTP :
3.EvaluateJsonPath :

4.SplitText :

5.ExtractText :

6.ReplaceText :


7. PutSQL :

Nifi Data Flow:

Template:
I attach my template here
Apache NiFi is a great tool for building
flexible and performant data ingestion pipelines. Since relational
databases are a staple for many data cleaning, storage, and reporting
applications, it makes sense to use NiFi as an ingestion tool for MySQL,
SQL Server, Postgres, Oracle, etc. In this we will look at how Apache
NiFi get FlowFile data into database.
Prerequisites:
- HDF 2.5
- Apache Nifi.
- MySQL, SQL Server, Postgres, Oracle, etc.
Database:
I am using database here is Postgres. Now create database and table.
ex : create table tablename (id varchar( ), fname varchar( ), lname varchar( ), gender varchar ( ), city varchar( ), state varchar ( ));
NiFI:
Provides Database Connection Pooling Service. Connections can be asked from pool and returned after usage. More details here
2.
Fetches data from an HTTP or HTTPS URL and
writes the data to the content of a FlowFile. Once the content has been
fetched, the ETag and Last Modified dates are remembered (if the web
server supports these concepts). This allows the Processor to fetch new
data only if the remote data has changed or until the state is cleared.
That is, once the content has been fetched from the given URL, it will
not be fetched again until the content on the remote server changes.
Note that due to limitations on state management, stored "last modified"
and etag fields never expire. If the URL in GetHttp uses Expression
Language that is unbounded, there is the potential for Out of Memory
Errors to occur. More details here
3.
Evaluates one or more JsonPath expressions
against the content of a FlowFile. The results of those expressions are
assigned to FlowFile Attributes or are written to the content of the
FlowFile itself, depending on configuration of the Processor. JsonPaths
are entered by adding user-defined properties; the name of the property
maps to the Attribute Name into which the result will be placed (if the
Destination is flowfile-attribute; otherwise, the property name is
ignored). The value of the property must be a valid JsonPath expression.
A Return Type of 'auto-detect' will make a determination based off the
configured destination. When 'Destination' is set to
'flowfile-attribute,' a return type of 'scalar' will be used. When
'Destination' is set to 'flowfile-content,' a return type of 'JSON' will
be used.If the JsonPath evaluates to a JSON array or JSON object and
the Return Type is set to 'scalar' the FlowFile will be unmodified and
will be routed to failure. A Return Type of JSON can return scalar
values if the provided JsonPath evaluates to the specified value and
will be routed as a match.If Destination is 'flowfile-content' and the
JsonPath does not evaluate to a defined path, the FlowFile will be
routed to 'unmatched' without having its contents modified. If
Destination is flowfile-attribute and the expression matches nothing,
attributes will be created with empty strings as the value, and the
FlowFile will always be routed to 'matched'. More details here
4.
Splits a text file into multiple smaller
text files on line boundaries limited by maximum number of lines or
total size of fragment. Each output split file will contain no more than
the configured number of lines or bytes. If both Line Split Count and
Maximum Fragment Size are specified, the split occurs at whichever limit
is reached first. If the first line of a fragment exceeds the Maximum
Fragment Size, that line will be output in a single split file which
exceeds the configured maximum size limit. This component also allows
one to specify that each split should include a header lines. Header
lines can be computed by either specifying the amount of lines that
should constitute a header or by using header marker to match against
the read lines. If such match happens then the corresponding line will
be treated as header. Keep in mind that upon the first failure of header
marker match, no more matches will be performed and the rest of the
data will be parsed as regular lines for a given split. If after
computation of the header there are no more data, the resulting split
will consists of only header lines. More details here
5.
Evaluates one or more Regular Expressions
against the content of a FlowFile. The results of those Regular
Expressions are assigned to FlowFile Attributes. Regular Expressions are
entered by adding user-defined properties; the name of the property
maps to the Attribute Name into which the result will be placed. The
first capture group, if any found, will be placed into that attribute
name.But all capture groups, including the matching string sequence
itself will also be provided at that attribute name with an index value
provided, with the exception of a capturing group that is optional and
does not match - for example, given the attribute name "regex" and
expression "abc(def)?(g)" we would add an attribute "regex.1" with a
value of "def" if the "def" matched. If the "def" did not match, no
attribute named "regex.1" would be added but an attribute named
"regex.2" with a value of "g" will be added regardless.The value of the
property must be a valid Regular Expressions with one or more capturing
groups. If the Regular Expression matches more than once, only the first
match will be used unless the property enabling repeating capture group
is set to true. If any provided Regular Expression matches, the
FlowFile(s) will be routed to 'matched'. If no provided Regular
Expression matches, the FlowFile will be routed to 'unmatched' and no
attributes will be applied to the FlowFile. More details here
6.
Updates the content of a FlowFile by
evaluating a Regular Expression (regex) against it and replacing the
section of the content that matches the Regular Expression with some
alternate value. More details here
Executes a SQL UPDATE or INSERT command.
The content of an incoming FlowFile is expected to be the SQL command to
execute. The SQL command may use the ? to escape parameters. In this
case, the parameters to use must exist as FlowFile attributes with the
naming convention sql.args.N.type and sql.args.N.value, where N is a
positive integer. The sql.args.N.type is expected to be a number
indicating the JDBC Type. The content of the FlowFile is expected to be
in UTF-8 format. More details here
Nifi Data Flow:
I attach my template here
Reference:
- https://www.batchiq.com/database-ingest-with-nifi.html
- http://funnifi.blogspot.in/2016/04/sql-in-nifi-with-executescript.html
- https://community.hortonworks.com/questions/54538/nifi-ingesting-a-file-from-sftp-and-insert-into-my.html
No comments:
Post a Comment