...
Code Block |
---|
|
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.6.10" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.6.10" % "provided"
libraryDependencies += "com.databricks" %% "spark-csv" %
"1.3.0"
libraryDependencies += "org.mongodb.spark" %% "mongo-spark-connector" % "2.1.0.0" |
assembly Dependency
Code Block |
---|
|
// In build.sbt
import sbt.Keys._
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false) |
...
See the documentation for a list of all available Partitioners : https://docs.mongodb.com/spark-connector/configuration/
Code Block |
---|
|
// Creation of SparkSession
val confsparkSession = { new SparkConfSparkSession.builder()
.setAppNameappName("example-spark-scala-read-and-write-from-mongo")
// Configuration for writing in a Mongo collection
.setconfig("spark.mongodb.output.uri", params.mongoUri)
.setconfig("spark.mongodb.output.collection", "restaurants")
// Configuration for reading a Mongo collection
.setconfig("spark.mongodb.input.uri", params.mongoUri)
.setconfig("spark.mongodb.input.collection", "restaurants")
// Type of Partitionner to use to transform Documents to dataframe
.setconfig("spark.mongodb.input.partitioner", "MongoPaginateByCountPartitioner")
// Number of partitions in the resulting dataframe
.setconfig("spark.mongodb.input.partitionerOptions.MongoPaginateByCountPartitioner.numberOfPartitions", "1")}
.getOrCreate() |
How to write a file to a Mongo collection with Spark Scala?
...
Code Block |
---|
|
// Reading Mongodb collection into a dataframe
val df = MongoSpark.load(sqlContextsparkSession)
logger.info(df.show())
logger.info("Reading documents from Mongo : OK") |
...