Talend - Query & Insert from MongoDB
Github Project : example-talend-query-insert-update-from-MongoDB
Preamble
Configuration: Context
To create the different jobs displayed in this article, you have to create a repository: With VALUES
Query from MongoDB (with all versions of Data Fabric)
Example: Count the number of lines.
- Create a new job
- Add the component "tMongoDBConnection" : Allows the creation of a MongoDB connection.
- Add the component "tMongoDBInput" : Read collection documents based on an query document.
- Add the component "tAggregateRow" : Receives a flow and aggregates it based on one or more columns.
- Add the component "tLogRow' : Display the result.
- Create links:
- "tMongoDBConnection" is connected with "tMongoDBInput" (through "OnSubjobOk")
- "tMongoDBInput" is connected with "tAggregateRow" (through Main")
- "tAggregateRow" is connected with "tLogRow" (through "Main")
- Double click on "tMongoDBConnection" and set its properties :
- Enter a host
- Enter a port
- Enter a name of BDD
- Enter a user
- Enter a password
- Double click on the component "tMongoDBInput" :
- Tick "Use an existing connection"
- Enter a name of collection
- Click on "Edit a schema"
- Enter name of columns
- Enter name of columns
- If you want, enter your query
- Double click on the component "tAggregateRow" :
- Click on "Edit a schema"
- Add the column "sum" in output
- Add the column "sum" in output
- In the field "operations", set its properties :
- Click on "Edit a schema"
- Run a job
Insert from MongoDB (with Data Fabric <= v.1.5)
This article isn't available with Data Fabric >= 1.6 because it's not secure.
Solution :
- Use Hive componants
- Modify file of configuration in Talend
Example : Insert ten rows (since old table Impala)
- Create a new job
- Add the component "tMongoDBConnection" : Allows the creation of a MongoDB connection.
- Add the component "tImpalaConnection" : Allows the creation of an Impala connection.
- Add the component "tImpalaInput": Reads an Impala table and extracts fields based on an Impala Query.
- Add the component "tMongoDBOutput': Inserts or Updates documents into a mongoDB collection.
- Create links:
- "tMongoDBConnection" is connected with "tImpalaConnection" (through "OnSubjobOk")
- "tImpalaConnection" is connected with "tImpalaInput" (through "OnSubjobOk")
- "tImpalaInput" is connected with "tMongoDBOutput" (through "Main")
- Double click on "tMongoDBConnection" and set its properties:
- Enter a host
- Enter a port
- Enter a name of BDD
- Enter a user
- Enter a password
- Double click on "tImpalaConnection" and set its properties:
- Add a "Cloudera" distribution and select the latest version of Cloudera
- Enter a host
- Enter a port
- Enter a name of BDD
- Double click on the component "tImpalaInput":
- Tick "Use an existing connection"
- Click on "Edit a schema"
- Enter a variable "flow"
- Enter a variable "flow"
- Enter a table name
- Enter your request: "SELECT * FROM "+context.Table_Impala+" LIMIT 10"
Use context variables if possible : "+context.Table_Impala+"
- Double click on the component "tMongoDBOutput":
- Tick "Use an existing connection"
- Enter a name collection
- Run a job