Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Table of Contents

Document datasets

Image RemovedImage Added

  1. Click the Datasets tab.
  2. Select the dataset you want to classify.

Image RemovedImage Added

  1. Classify the dataset in this window.

...

Add a comment to the dataset

Document personal data

Image RemovedImage Added

To access the personal data panel :

...

Tag dataset containing personal data

Image RemovedImage Added

  1. Check "contains personal data" box if the dataset contains personal data

Document the consent and / or configure anonymization process

Image RemovedImage Added

  1. Click to Edit Settings to document the consent and / or configure anonymization process

...

An default anonymization process in scala is proposed by Saagie. The code source is available here : https://github.com/saagie/outis.

You can use it as is, change it or replace it with one of your process.

To use it, build jar and create Spark processing job on platform with command line : 

...

Code Block
spark-submit \
--conf "spark.executor.extraJavaOptions='-Dlog4j.configuration=log4j.xml'" \
--conf spark.ui.showConsoleProgress=false \
--driver-java-options "-Dlog4j.configuration=log4j.xml" \
{file} -u hdfs_user -t metastore_url -d datagov_user -p $ENV_VAR_PASSWORD datasetsToAnonymized_url callback_url

where :

  • hdfs_user = user to launch job - user must have right to write in hdfs
  • metastore_url = url of the hive metastore (exp : thrift://nn1:9083)
  • datagov_user = user to access to Data Governance on the platform with right "Access all datasets" (may be the same as hdfs_user)
  • datasetsToAnonymized_url = url to obtain datasets to anonymized (exp : http://{IP_DATAGOVERNANCE}:{PORT}/api/v1/datagovernance/platform/{PLATFORM_ID}/privacy/datasets)
  • callback_url = url to inform dataset is anonymized (exp : http://{IP_DATAGOVERNANCE}:{PORT}/api/v1/datagovernance/platform/{PLATFORM_ID}/privacy/events/datasetAnonymized)
  • $ENV_VAR_PASSWORD : environment variable for password


You can dowload the last version of the jar here :   View filename

outis-link-1.01.0.jarheight250

Exceptions handling

No dataset anonymization if : 

...