Date: Fri, 29 Mar 2024 06:45:10 +0000 (UTC) Message-ID: <497051403.7.1711694710720@b741ad84f663> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_6_1556273460.1711694710718" ------=_Part_6_1556273460.1711694710718 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Gist Page : example-= python-read-and-write-from-impala-with-security
The script bellow don't work with thrift-sasl 0.3.0 but only with thrift= -sasl 0.2.1.
Add thrift-sasl=3D=3D0.2.1 to your requirement.txt file.
import ib= is import pandas as pd import os
WEBHDFS URI are like that : http://namenodedns:port/= user/hdfs/folder/file.csv
Default port is 50070
Default port is 21050.
# Connect= ing to Impala by providing Impala host ip and port (21050 by default),crede= ntials and a Webhdfs client hdfs =3D ibis.hdfs_connect(host=3Dos.environ['IP_HDFS'], port=3D50070) client =3D ibis.impala.connect(host=3Dos.environ['IP_IMPALA'], port=3D21050= , hdfs_client=3Dhdfs, user=3Dos.environ['LDAP_USER'], password=3Dos.environ= ['LDAP_PASSWORD'], auth_mechanism=3D'PLAIN')
Impala over SSL
If your Impala is secured with SSL, you have to add the following parame= ters to your ibis.impala.connect() command:
# Creatin= g a simple pandas DataFrame with two columns liste_hello =3D ['hello1','hello2'] liste_world =3D ['world1','world2'] df =3D pd.DataFrame(data =3D {'hello' : liste_hello, 'world': liste_world}) # Writing Dataframe to Impala if table name doesn't exist db =3D client.database('default') if not client.exists_table('helloworld'): db.create_table('helloworld', df) t =3D db['helloworld'] t.execute()
# =3D=3D= =3D=3D=3D=3D Reading table =3D=3D=3D=3D=3D=3D # Selecting data with a SQL query #limit=3DNone to get the whole table, otherwise will only get 10000 first l= ines requete =3D client.sql('select * from helloworld') df =3D requete.execute(limit=3DNone)
# Write i= n table C the join between tables A and B client.raw_sql('CREATE TABLE c STORED AS PARQUET AS SELECT a.col1, b.col2 F= ROM a INNER JOIN b ON (a.id=3Db.id)') # No data is incomming in Python