Critical Data Identifier

Various privacy regulations and laws require anonymization (de-identification) of sensitive personal data files before they are available to user groups for secondary use (test , QA, research). Some examples: Health Insurance Portability and Accountability Act (HIPAA , USA), Children‘s Online Privacy Protection Act (COPPA), The European Privacy Directive 95/46/EC , De Privacywet – La Loi vie privée – The Privacy Act , Wet bescherming persoonsgegevens , Wbp ; Dutch Data Protection Act , the Federal Data Protection Act (BDSG).

In addition to the operational challenges of repeated anonymization of current production data, it is sometimes even problematic to determine the attributes that are to be anonymized. Especially if relevant documentation is lacking and/or source code experts are not available, or cannot assist.

It is obvious that attributes such as names, addresses, phone numbers, email addresses, as well as various IDs are worthy of protection in the above sense. But which columns from hundreds of tables belong in this category, and how are they interdependent? XDM supports anonymization and in particular its component Critical Data Identifier (CDI) identifies the relevant columns.

How it works
XDM is a tool designed for efficient and convenient test data deployment. Among other things, it allows data copies of different granularity; on database, table and row level. XDM considers structural dependencies. XDM integrates the anonymization into the copy process. There are no plain view unloaded data files and the anonymization rules are invariant to table structure changes.

The component CDI (Critical Data Identifier) provides fast and simple detection of personal data and related columns. Using various heuristics, XDM analyzes the data model and examines column contents and filters personal data.


CDI‘s analysis includes the following categories:

  • First and last name
  • Address with street, city and zip code
  • Phone and fax numbers
  • E-mail address
  • Banking information such as account number, BIC,
  • IBAN, and name of the bank

The result of the analysis is a report on the attributes to be protected. This report, which is available quickly, can be used as a basis for decision making and the development of modification rules. Users are free to define additional categories such as customer, patient or contract numbers, IDs, etc. and to adjust the patterns of categories to specific needs (locals).

XDM and CDI reduce the expenditure of analysis, design and implementation of anonymization or pseudonymization drastically. Fields which require privacy protection are determined virtually at your fingertips. The seamless integration of CDI and XDM links column identification with rule definition and column allocation and allows immediate rule tests. With XDM, the anonymization project becomes an integrated part of test data provision. Anonymization simply takes place during the copy process, scheduler-driven, in routine operations.