Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagescala
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.6.10" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.6.10" % "provided"
libraryDependencies += "com.databricks" %% "spark-csv" %
"1.3.0"
libraryDependencies += "org.mongodb.spark" %% "mongo-spark-connector" % "2.1.0.0"

assembly Dependency

Code Block
languagescala
// In build.sbt
import sbt.Keys._
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)

...

See the documentation for a list of all available Partitioners : https://docs.mongodb.com/spark-connector/configuration/

Code Block
languagescala
// Creation of SparkSession
val confsparkSession = { new SparkConfSparkSession.builder()
  .setAppNameappName("example-spark-scala-read-and-write-from-mongo")
  // Configuration for writing in a Mongo collection
  .setconfig("spark.mongodb.output.uri", params.mongoUri)
  .setconfig("spark.mongodb.output.collection", "restaurants")
  // Configuration for reading a Mongo collection
  .setconfig("spark.mongodb.input.uri", params.mongoUri)
  .setconfig("spark.mongodb.input.collection", "restaurants")
  // Type of Partitionner to use to transform Documents to dataframe
  .setconfig("spark.mongodb.input.partitioner", "MongoPaginateByCountPartitioner")
  // Number of partitions in the resulting dataframe
  .setconfig("spark.mongodb.input.partitionerOptions.MongoPaginateByCountPartitioner.numberOfPartitions", "1")}
  .getOrCreate()

How to write a file to a Mongo collection with Spark Scala?

...

Code Block
languagejava
// Reading Mongodb collection into a dataframe
val df = MongoSpark.load(sqlContextsparkSession)
logger.info(df.show())
logger.info("Reading documents from Mongo : OK")

...