Spark Scala - Read & Write files from Hive

Github Project : example-spark-scala-read-and-write-from-hive

Common part

sbt Dependencies

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0" % "provided"

assembly Dependency

// In build.sbt
import sbt.Keys._
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
// In project/assembly.sbt
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.1")

HDFS URI

HDFS URI are like that : hdfs://namenodedns:port

Default port is 8020.

Init SparkContext and HiveContext

// Creation of SparkSession
val sparkSession = SparkSession.builder()
  .appName("example-spark-scala-read-and-write-from-hive")
  .config("hive.metastore.warehouse.dir", params.hiveHost + "user/hive/warehouse")
  .enableHiveSupport()
  .getOrCreate()

How to write to a Hive table with Spark Scala?

Code example

 // ====== Creating a dataframe with 1 partition
import sparkSession.implicits._
val df = Seq(HelloWorld("helloworld")).toDF().coalesce(1)

// ======= Writing files
// Writing Dataframe as a Hive table
import sparkSession.sql

sql("DROP TABLE IF EXISTS helloworld")
sql("CREATE TABLE helloworld (message STRING)")
df.write.mode(SaveMode.Overwrite).saveAsTable("helloworld")
logger.info("Writing hive table : OK")

How to read from a Hive table with Spark Scala ?

Code example

// ======= Reading files
// Reading hive table into a Spark Dataframe
val dfHive = sql("SELECT * from helloworld")
logger.info("Reading hive table : OK")
logger.info(dfHive.show())