Python - Read & Write files from Hive with Security

Python - Read & Write files from Hive with Security

Gist Page : example-python-read-and-write-from-hive-with-security

Common part

Libraries dependency and some configuration

import ibis import pandas as pd import os # ====== Ibis conf (to avoid a bug) ====== with ibis.config.config_prefix('impala'): ibis.config.set_option('temp_db', '`__ibis_tmp`')

WEBHDFS URI

WEBHDFS URI are like that : http://namenodedns:port/user/hdfs/folder/file.csv

Default port is 50070

Hive Connection

Default port is 10000.

Connection

# Connecting to Hive by providing Hive host ip and port (10000 by default) and a Webhdfs client hdfs = ibis.hdfs_connect(host=os.environ['IP_HDFS'], port=50070) client = ibis.impala.connect(host=os.environ['IP_HIVE'], port=10000, hdfs_client=hdfs, user=os.environ['USER'], password=os.environ['PASSWORD'], auth_mechanism='PLAIN')

How to write an Hive table with Python ?

Code example

# Creating a simple pandas DataFrame with two columns liste_hello = ['hello1','hello2'] liste_world = ['world1','world2'] df = pd.DataFrame(data = {'hello' : liste_hello, 'world': liste_world}) # Writing Dataframe to Hive if table name doesn't exist db = client.database('default') if not client.exists_table('helloworld'): db.create_table('helloworld', df) t = db['helloworld'] t.execute()

How to query an Hive table with Python ?

Code example

# ====== Reading table ====== # Selecting data with a SQL query #limit=None to get the whole table, otherwise will only get 10000 first lines requete = client.sql('select * from helloworld') df = requete.execute(limit=None)