Page Comparison

Table of Contents

...

An default anonymization process in scala is proposed by Saagie. The code source is available here : https://github.com/saagie/outis.

You can use it as is, change it or replace it with one of your process.

To use it, build jar and create Spark processing job on platform with command line :

...

Code Block

spark-submit \
--conf "spark.executor.extraJavaOptions='-Dlog4j.configuration=log4j.xml'" \
--conf spark.ui.showConsoleProgress=false \
--driver-java-options "-Dlog4j.configuration=log4j.xml" \
{file} -u hdfs_user -t metastore_url -d datagov.anonymization_user -p $ENV_VAR_PASSWORD datasetsToAnonymized_url callback_url

where :

hdfs_user = user to launch job - user must have access to Data Governance on the platform with minimum right "Access all Datasets"right to write in hdfs
metastore_url = url of the hive metastore (exp : thrift://nn1:9083)
datagov_user = user to access to Data Governance on the platform with right "Access all datasets" (may be the same as hdfs_user)
datasetsToAnonymized_url = url to obtain datasets to anonymized (exp : http://{IP_DATAGOVERNANCE}:{PORT}/api/v1/datagovernance/platform/{PLATFORM_ID}/privacy/datasets)
callback_url = url to inform dataset is anonymized (exp : http://{IP_DATAGOVERNANCE}:{PORT}/api/v1/datagovernance/platform/{PLATFORM_ID}/privacy/events/datasetAnonymized)
$ENV_VAR_PASSWORD : environment variable for password

...

Versions Compared

Old Version 20

New Version Current

Key