BCV5

Masking Tool

More and more organizations use data masking to proactively protect their data, ensure legal compliance, avoid the massive cost of security breaches or simply because it is good practice to protect PII (personally identifiable information) in test and QA environments. But, creating a data masking strategy for your organization and getting the process to work is hard.

The Masking Tool, which is a component of the Db2 for
 z/OS copy tool BCV5, enables you to implement your masking strategy by delivering dozens of masking algorithms in the form of Db2 user defined functions right out of the box. These functions can generate artificial, but seemingly real data, such as names, addresses, credit card numbers, social security numbers, and so on. The generated data is plausible. For example, credit card numbers pass validity checks, and addresses have matching street names, zip codes, cities, and states.

All of the Masking Tool functions generate masked data based on an input value. The input value can be an arbitrary string or number. It is reduced to a single numeric value by a hashing algorithm. This numeric value then serves as a seed for a generator. Some data types, such as social security numbers or credit card numbers, can be generated directly from the seed value through mathematical operations. Other types of data, like names or addresses, are picked from a set of lookup tables. The Masking Tool comes with several pre-defined lookup tables that contain thousands of names and millions of addresses in different languages.

By using a hashing algorithm, the Masking Tool ensures that the masking process can be repeated with the same results. In other words, the same input value will always result in the same masked value. There is no randomness to the masked data. This is beneficial for testers because tests need to run repeatedly and they should run with the same preconditions. As a side benefit, Db2 can also cache masked values to reduce CPU consumption.

At the same time, it is not possible to calculate the original value by using the known masked value. This is a big advantage compared to masking strategies that work by shifting letters or digits.

Data types that can be generated by the set of masking functions include:

  • First names, last names
  • Postal addresses (street, house number, city, zip code, state, country)
  • E-mail addresses
  • Social security numbers (SSN/SIN)
  • Credit card numbers
  • UUIDs
  • Dates
  • Bank names and routing numbers
  • International bank account numbers
  • Pattern-based strings (company-specific customer IDs, license plate numbers, etc.)

The Masking Tool functions are either compiled SQL scalar functions or inlined SQL scalar functions. They are written in PL/SQL. This has the advantage that it is easy to customize the functions because there is no need to compile or link any code. At the same time, they are easier to manage because there are no external load modules that the DBAs would need to manage. And finally, these functions run in the DBM1 address space so no task switching is required when they are called. Since masking functions are called for every row in a table, this results in a significant performance advantage compared to external functions.

A set of rules is used to specify which columns of which tables should be masked. The rules are evaluated at run time, and the Masking Tool will automatically identify the involved data types and perform all necessary casting operations. You can have a separate set of rules for each Db2 subsystem that you work with. Depending on your requirements, you can either mask data while making a copy of your tables, or you can mask data in-place. The first option is useful when copying data from a production environment into a test or QA system. The second option allows you to modify the contents of an existing set of tables without making another copy. This can be used to mask data in a pre-production environment that was created by making a 1:1 copy of a productive system.

BCV5‘s Masking Tool gives you a set of powerful instruments that enables you to implement your data masking strategy in a consistent, reliable and secure way.