Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Table of Contents

...

An default anonymization process in scala is proposed by Saagie. The code source is available here : https://github.com/saagie/outis.

You can use it as is, change it or replace it with one of your process.

To use it, build jar and create Spark processing job on platform with command line : 

...

Managed types

String anonymization : 
The strings fielsfields are anonuymizedanonymized by substitution (character by character) :
  • If the character is a digit, he is substituted by an another digit
  • If the character is a letter, he is substituted by an another letter
  • otherwise the character main remains unchanged
Date anonymization : 
The date fields are anonymized randomly between January 1, 1920 and now :
  • If the field is a String type and tagged as a Date type, a randomize randomized date in String format with the same pattern is generated 

  • If the field is a Timestamp type, a randomize randomized Timestamp is generated

  • if the field is a Date type, a randomize date randomized Date is generated

Numeric anonymization : 
All numeric types are anonymized randomly. The generated value can not exceed the

...

type max value.
This covers these types : Byte, Short, Int, Long, Float, Double and BigDecimal.