Talend - Query & Insert from MongoDB
Github Project :Ā example-talend-query-insert-update-from-MongoDB
Preamble
Configuration: Context
To create the different jobs displayed in this article, you have to create a repository:Ā With VALUES
Query from MongoDBĀ (with all versions of Data Fabric)
Example: Count the number of lines.
- Create a new job
- Add theĀ component "tMongoDBConnection" : Allows the creation of a MongoDB connection.
- Add theĀ component "tMongoDBInput" : Read collection documents based on an query document.Ā
- Add theĀ componentĀ "tAggregateRow" : Receives a flow and aggregates it based on one or more columns.
- Add theĀ componentĀ "tLogRow' :Ā Display the result.
- Create links:
- "tMongoDBConnection" is connected with "tMongoDBInput" (through "OnSubjobOk")
- "tMongoDBInput" is connected with "tAggregateRow" (through Main")
- "tAggregateRow" is connected with "tLogRow" (through "Main")
- Double click on "tMongoDBConnection" and set its properties :
- Enter a host
- EnterĀ a port
- EnterĀ a name of BDD
- Enter a user
- Enter a password
- Double click on the component "tMongoDBInput" :
- Tick "Use an existing connection"
- Enter a name of collection
- Click on "Edit a schema"
- Enter name of columns
- Enter name of columns
- If you want, enter your query
- Double click on the component "tAggregateRow" :
- Click on "Edit a schema"
- Add the column "sum" in output
- Add the column "sum" in output
- In the field "operations",Ā set its properties :
- Click on "Edit a schema"
- Run a job
Insert from MongoDBĀ (with Data Fabric <= v.1.5)
This article isn't available with Data Fabric >= 1.6 because it's not secure.Ā
Solution :Ā
- Use Hive componants
- Modify file of configuration in Talend
Example : Insert ten rows (since old table Impala)
- Create a new job
- Add theĀ component "tMongoDBConnection" : Allows the creation of a MongoDB connection.
- Add theĀ component "tImpalaConnection" : Allows the creation of an Impala connection.
- Add theĀ componentĀ "tImpalaInput": Reads an Impala table and extracts fields based on an Impala Query.
- Add theĀ componentĀ "tMongoDBOutput': Inserts or Updates documents into a mongoDB collection.Ā
- Create links:
- "tMongoDBConnection" is connected with "tImpalaConnection" (through "OnSubjobOk")
- "tImpalaConnection" is connected with "tImpalaInput" (through "OnSubjobOk")
- "tImpalaInput" is connected with "tMongoDBOutput" (through "Main")
- Double click on "tMongoDBConnection" and set its properties:
- Enter a host
- EnterĀ a port
- EnterĀ a name of BDD
- Enter a user
- Enter a password
- Double click on "tImpalaConnection" and set its properties:
- Add a "Cloudera" distribution and select the latest version of Cloudera
- Enter a host
- EnterĀ a port
- EnterĀ a name of BDD
- Double click on the component "tImpalaInput":
- Tick "Use an existing connection"
- Click on "Edit a schema"
- Enter a variable "flow"
- Enter a variable "flow"
- Enter a table name
- Enter your request: "SELECT * FROM "+context.Table_Impala+" LIMIT 10"
Use context variables if possible : "+context.Table_Impala+"
- Double click on the component "tMongoDBOutput":
- Tick "Use an existing connection"
- Enter a name collection
- Run a job