Spark Scala - Read & Write files from Hive
Github Project : example-spark-scala-read-and-write-from-hive
Common part
sbt Dependencies
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0" % "provided" libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0" % "provided"
assembly Dependency
// In build.sbt import sbt.Keys._ assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
// In project/assembly.sbt addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.1")
HDFS URI
HDFS URI are like that : hdfs://namenodedns:port
Default port is 8020.
Init SparkContext and HiveContext
// Creation of SparkSession val sparkSession = SparkSession.builder() .appName("example-spark-scala-read-and-write-from-hive") .config("hive.metastore.warehouse.dir", params.hiveHost + "user/hive/warehouse") .enableHiveSupport() .getOrCreate()
How to write to a Hive table with Spark Scala?
Code example
// ====== Creating a dataframe with 1 partition import sparkSession.implicits._ val df = Seq(HelloWorld("helloworld")).toDF().coalesce(1) // ======= Writing files // Writing Dataframe as a Hive table import sparkSession.sql sql("DROP TABLE IF EXISTS helloworld") sql("CREATE TABLE helloworld (message STRING)") df.write.mode(SaveMode.Overwrite).saveAsTable("helloworld") logger.info("Writing hive table : OK")
How to read from a Hive table with Spark Scala ?
Code example
// ======= Reading files // Reading hive table into a Spark Dataframe val dfHive = sql("SELECT * from helloworld") logger.info("Reading hive table : OK") logger.info(dfHive.show())