...
Gist Page : example-python-read-and-write-from-hive-with-security
Common part
Libraries dependency
Code Block |
---|
|
import ibis
import pandas as pd
import os |
WEBHDFS URI
WEBHDFS URI are like that : http://namenodedns:port/user/hdfs/folder/file.csv
Default port is 50070
Hive Connection
Default port is 10000.
Connection
Code Block |
---|
|
# Connecting to Hive by providing Hive host ip and port (10000 by default) and a Webhdfs client
hdfs = ibis.hdfs_connect(host=os.environ['IP_HDFS'], port=50070)
client = ibis.impala.connect(host=os.environ['IP_HIVE'], port=10000, hdfs_client=hdfs, user=os.environ['USER'], password=os.environ['PASSWORD'], auth_mechanism='PLAIN')
|
How to write an Hive table with Python ?
Code example
Code Block |
---|
|
# Creating a simple pandas DataFrame with two columns
liste_hello = ['hello1','hello2']
liste_world = ['world1','world2']
df = pd.DataFrame(data = {'hello' : liste_hello, 'world': liste_world})
# Writing Dataframe to Hive if table name doesn't exist
db = client.database('default')
if not client.exists_table('helloworld'):
db.create_table('helloworld', df)
t = db['helloworld']
t.execute() |
How to query an Hive table with Python ?
Code example
Code Block |
---|
|
# ====== Reading table ======
# Selecting data with a SQL query
#limit=None to get the whole table, otherwise will only get 10000 first lines
requete = client.sql('select * from helloworld')
df = requete.execute(limit=None) |
...