Test data procurement – Three basic options

In the process of software development, tests need to be performed repeatedly. Depending on the stage of development, various test data, from individual test case data to bulk data, is required. There are three basic options for obtaining this test data.

1  The manual creation of test data

Most software developers have already used this option in practice. Before developing a new feature, they manually type fictitious test data into a database, which is then used for testing. However, this manual generation of test data is quite time-consuming and prone to errors. It is monotonous work that distracts from the actual development.

Of course, complex structures cannot be simulated here either. Most of the time, this manual test case data is only developed for a concrete feature that is currently being processed and should be tested.

In his presentation at the Navigate congress, software architect Ulrich Lehner from LVM-Versicherungen (Insurance) paints an apt picture of this and compares the situation with a car headlight: “In the case of car headlights, only that which we are interested in while driving is illuminated at night. We only get to see the information that interests us now for the specific route, for the specific ride. All the surrounding information is not captured by the headlights, or at most by scattered light.”

Similar to this example, manual test data generation focuses on the one aspect currently considered in the feature. All cross-references and other relations cannot be taken into account. But having an overall picture is important to developing the feature and advancing it to production maturity. For this reason, and the previously mentioned ones, manually generated test data is not the best solution.

2  The creation of synthetic test data

Synthetic test data can be created automatically and in bulk. This is neither time-consuming nor monotonous for the developer, so at first sight it seems to be a good alternative for obtaining suitable test data.

However, this method also has various disadvantages:

a) Low detail level of synthetic test data
Imagine a customer with all his contracts. There are many details connected with this. For example, there is the agency where the contracts are held. And there are employees who have received commission for the conclusion of the contract, etc. So: customer, contracts, agency, employees, commissions … Often, such detailed relationships can not be generated in depth using synthetic test data.

b) Low consistency of synthetic test data
Even if coherent data and chains have been generated with a great deal of effort, there is usually a lack of consistency. This means that although the data is basically related in the considered subsection, it does not provide a consistent, coherent picture in the overall context of all data, as it would if real customer data was used.

c) Low diversification of synthetic test data
Another aspect is the low diversification: Mostly synthetic data is generated for the main use cases. Rarely arising fringe cases are usually not considered. However, specifically these cases are important for testing, because in practice they can lead to major problems.

d) High sterility of synthetic test data
The term “synthetic” already implies the artificial nature of such test data. It is data from a “”test tube””. It has not grown and has no history. Thus, is also has a high level of sterility. In the best case, it corresponds to the expected data consistency of the given system. It might be model-perfect, neat as a pin, comparable to the stereotypical family from an advertising poster. But the crucial question is: how realistic is such synthetic data?

Of course, one can try to account for some of these issues when generating synthetic data. Perhaps the level of detail or diversification can be increased somewhat … But the potential for improvement is limited due to the complexity of the given structures and the number of possible combinations. The effort required to increase the depth and range increases exponentially quickly.

3  Conversion of productive data into test data

The third option uses real production data to generate the required test data. For this purpose, the production is copied 1:1 and thus automatically has the highest possible level of reality and quality. This would immediately eliminate the previously described disadvantages in terms of level of detail, consistency and differentiation. In addition, the real data has a real history and thus low sterility.

This data was actually “organically grown”. Maybe it was once created as an IMS table, then ended up in the Db2 z/OS and is now a Db2 LOB … I.e. the data has a certain life cycle behind it. And in the end, this is the high quality test data that is needed for realistic testing of a feature.

Now there’s a catch: the requirements of the General Data Protection Regulation (GDPR). According to the law, data may only be used for the purpose for which it was originally collected. For example, customer data can only be used to process the contractual relationship and to support the customer and – if consented to – inform him about new products. Under no circumstances may the data simply be copied and used for test data.

However, the data may be used if all personal information has been removed or alienated from it. This is done by means of pseudonymization. Software solutions, such as XDM from the UBS Hainer TDM Suite, offer precisely such – automated – pseudonymization, which meets all the requirements of the GDPR and yet fulfills all the above-mentioned requirements for high-quality test data.

Do you have any questions about this topic? We would be happy to demonstrate our solutions based on your specific requirements.

CURRENT POSTS

TDM Solution – Make or Buy?

The TOP5 ARGUMENTS why developing your own test data management solution is no longer profitable today. TDM has become a complex issue. For this reason, there are experts today who offer mature TDM solutions to the market

Read more »

Test data procurement – Three basic options

In the process of software development, tests need to be performed repeatedly. Depending on the stage of development, various test data, from individual test case data to bulk data, is required. There are three basic options for obtaining this test data: The manual creation of test data, the creation of synthetic test data and the conversion of productive data into test data

Read more »

Bulk data for system and release tests

Best-Practice: Test data procurement in the context of continuous software development (PART 3/3). Before the new or modified applications go live, system, release, load or performance tests are applied. For this purpose, no fine-grained customized test case data is needed, but production-related data in larger quantities is required. These tests are

Read more »

Customized test case data for functional, component and regression testing

Best-Practice: Test data procurement in the context of continuous software development (PART 2/3). The further development of an application usually means that different features or even bug fixes need to be implemented. Ideally, each feature gets its own environment. This environment contains only the relevant data for that particular case.

Read more »

Green light for automated
Test Data Management

Take advantage of our free initial consultation to quickly and easily determine your optimal and individual options!

Green traffic light