Python - Write figure to HDFS

Gist Page : example-python-write-figure-to-hdfs

Common part

Libraries dependency

from hdfs import InsecureClient
import os
import io

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

WEBHDFS URI

WEBHDFS URI are like that : http://namenodedns:port/user/hdfs/folder/file.csv

Default port is 50070

Connection

# Connecting to Webhdfs by providing hdfs host ip and webhdfs port (50070 by default)
client_hdfs = InsecureClient('http://' + os.environ['IP_HDFS'] + ':50070')

How to create a Matplotlib figure with Python ?

Code example

# ====== Figure Creation ======
# Creating Matplotlib figure but not printing it
plt.ioff()
plt.figure(figsize=(15,9))
plt.scatter(range(10),range(10),c=range(10),marker='o',s=500)
plt.title('Example of figure')
plt.tight_layout()

How to write a Matplotlib figure to hdfs with Python ?

Code example

# ====== Writing to hdfs ======
# Writing figure to hdfs through a BytesIO python object
buf = io.BytesIO()
plt.savefig(buf, format='png')
buf.seek(0)
with client_hdfs.write('/user/hdfs/data/figure.png',overwrite=True) as writer :
    writer.write(buf.getvalue())
buf.close()