Talend - Query & Insert from MongoDB

Preamble

Configuration: Context

To create the different jobs displayed in this article, you have to create a repository: With VALUES

Query from MongoDB (with all versions of Data Fabric)

Example: Count the number of lines.

  • Create a new job
  • Add the component "tMongoDBConnection" : Allows the creation of a MongoDB connection.
  • Add the component "tMongoDBInput" : Read collection documents based on an query document. 
  • Add the component "tAggregateRow" : Receives a flow and aggregates it based on one or more columns.
  • Add the component "tLogRow' : Display the result.
  • Create links:
    • "tMongoDBConnection" is connected with "tMongoDBInput" (through "OnSubjobOk")
    • "tMongoDBInput" is connected with "tAggregateRow" (through Main")
    • "tAggregateRow" is connected with "tLogRow" (through "Main")

  • Double click on "tMongoDBConnection" and set its properties :
    • Enter a host
    • Enter a port
    • Enter a name of BDD
    • Enter a user
    • Enter a password

  • Double click on the component "tMongoDBInput" :
    • Tick "Use an existing connection"
    • Enter a name of collection
    • Click on "Edit a schema"
      • Enter name of columns
    • If you want, enter your query

  • Double click on the component "tAggregateRow" :
    • Click on "Edit a schema"
      • Add the column "sum" in output
    • In the field "operations", set its properties :

  • Run a job

Insert from MongoDB (with Data Fabric <= v.1.5)

This article isn't available with Data Fabric >= 1.6 because it's not secure. 

Solution : 

  • Use Hive componants
  • Modify file of configuration in Talend

Example : Insert ten rows (since old table Impala)

  • Create a new job
  • Add the component "tMongoDBConnection" : Allows the creation of a MongoDB connection.
  • Add the component "tImpalaConnection" : Allows the creation of an Impala connection.
  • Add the component "tImpalaInput": Reads an Impala table and extracts fields based on an Impala Query.
  • Add the component "tMongoDBOutput': Inserts or Updates documents into a mongoDB collection. 
  • Create links:
    • "tMongoDBConnection" is connected with "tImpalaConnection" (through "OnSubjobOk")
    • "tImpalaConnection" is connected with "tImpalaInput" (through "OnSubjobOk")
    • "tImpalaInput" is connected with "tMongoDBOutput" (through "Main")

  • Double click on "tMongoDBConnection" and set its properties:
    • Enter a host
    • Enter a port
    • Enter a name of BDD
    • Enter a user
    • Enter a password

  • Double click on "tImpalaConnection" and set its properties:
    • Add a "Cloudera" distribution and select the latest version of Cloudera
    • Enter a host
    • Enter a port
    • Enter a name of BDD

  • Double click on the component "tImpalaInput":
    • Tick "Use an existing connection"
    • Click on "Edit a schema"
      • Enter a variable "flow"
    • Enter a table name
    • Enter your request: "SELECT * FROM "+context.Table_Impala+" LIMIT 10"
      Use context variables if possible : "+context.Table_Impala+"

  • Double click on the component "tMongoDBOutput":
    • Tick "Use an existing connection"
    • Enter a name collection

  • Run a job