Lexicon

Anonymization

"Anonymization" of data means processing it with the aim of irreversibly preventing the identification of the individual to whom it relates. Data can be considered anonymised when it does not allow identification of the individuals to whom it relates, and it is not possible that any individual could be identified from the data by any further processing of that data or by processing it together with other information which is available or likely to be available.

Attribute name

In Saagie data governance, there are names given to a field.

Category

In Saagie Data Governance, a category means that the field has a limited set of values.

Consent

Any freely given, specific, informed and unambiguous indication of his or her wishes by which the data subject, either by a statement or by a clear affirmative action, signifies agreement to personal data relating to them being processed.

Data status

In Saagie Data Governance, Data status allows to indicate status of dataset :

  • Raw data
  • Intermediate data
  • Final data
  • Not specified 

Database

A database is a collection of information that is organized so that it can be easily accessed, managed and updated.

Data is organized into rows, columns and tables, and it is indexed to make it easier to find relevant information.

Dataset

A dataset is a collection of related, discrete items of data that may be accessed individually or in combination or managed as a whole entity. A dataset is organized into some type of data structure.

Dataset can have 3 types :

  • TABLE
  • DIRECTORY
  • FILE


Domain

In Saagie Data Governance, domains are used to group a set of datasets by theme. They can correspond to the departments in the company for example. They will facilitate the exploration of the data lake.

Entity name(s)

In Saagie Data Governance, there are names given to a dataset.

Entry Date

In Saagie Data Governance, Entry data of personal data is registration date, date the data was created.

Field

In a database table, a field is a data structure for a single piece of data.

GDPR

The General Data Protection Regulation (GDPR) is a legal framework that sets guidelines for the collection and processing of personal information of individuals within the European Union (EU). The GDPR sets out the principles for data management and the rights of the individual, while also imposing fines that can be revenue based. The General Data Protection Regulation covers all companies that deal with the data of EU citizens, so it is a critical regulation for corporate compliance officers at banks, insurers, and other financial companies. GDPR will come into effect across the EU on May 25, 2018.

Official Journal of the European Union

Journal officiel de l'Union Européenne

Global ranking

In Saagie data governance, it represents quality rank on a dataset = Trust Tag x Status Tag x Named Entity

Trust Tag :

  • Verified Good : 2
  • Verified Bad : 0.4
  • In verification : 1
  • N/A : 0.8

Status Tag :

  • Final : 1.2
  • Intermediate : 1.0
  • Raw : 0.8
  • N/A : 0.7

Named Entity :

  • Input by user : 1.1
  • No, empty, null named : 1.0

Master data

Master data means that for a table, the field/name attribute is the master, so the reference value.

Personal data

According to the law, personal data means any information relating to an identified or identifiable individual; an identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identification number (e.g. social security number) or one or more factors specific to his physical, physiological, mental, economic, cultural or social identity (e.g. name and first name, date of birth, biometrics data, fingerprints, DNA…).

Primary key

A primary key is a special relational database table column (or combination of columns) designated to uniquely identify all table records.

Provenance

Provenance is a source from which the dataset comes.

Pseudonymization

"Pseudonymization" of data means replacing any identifying characteristics of data with a pseudonym, or, in other words, a value which does not allow the data subject to be directly identified. 
Although pseudonymization has many uses, it should be distinguished from anonymisation, as it only provides a limited protection for the identity of data subjects in many cases as it still allows identification using indirect means. Where a pseudonym is used, it is often possible to identify the data subject by analysing the underlying or related data.

Secondary key

The fields in a table which have not been selected to be the primary key, but are considered to be the candidate keys for the primary key are referred to as Secondary Keys.”

Sensitive Data

Personal data that reveals, directly or indirectly, the racial and ethnic origins, the political, philosophical, religious opinions or trade union affiliation of persons, or related to their health or sexual life.

Trust level

In Saagie Data Governance, trust level allows to indicate trust level of dataset :

  • Verified good
  • Verified bad
  • In verification
  • Not verified