Test Data Management

Thoroughly testing new software applications is inescapable. Although this statement is widely accepted, the opinions on what thoroughly exactly means are deeply divided. To what degree is testing required and at what costs does it remain affordable? Are functional unit tests sufficient or is it necessary to conduct extensive stress tests of new releases of your applications before placing them into production? Since we are delivering to business areas, it all comes down to a reasonable relation of expense and revenue. Also add in a reasonable magnitude of risk as the costs of testing cannot exceed the costs of failure.

If a unit test of a newly developed application component fails then it can be seen as a lost man-day of a developer’s work schedule. Most companies allow for these days in their test schedules and development costs. However if the same component fails in production it may impact your entire business and have long-term dire implications. Therefore the accepted standard is you increase the intensity and quality of test staging when moving from development to production environments. The concepts of unit, integration, and system tests follow the same thinking. At the lower levels of development, tests may focus on very specific functions of single components, using simple test cases that were developed at the same time when the software’s functionality was defined. As soon as testing moves to higher levels of integration then the requirements of test procedures with near-production quality of test cases grows. e.g. integration testing with multiple components of the system requires test cases with consistent data relations covering the integrated areas. At the highest levels involving system integration tests, the application has to be challenged with test cases that are comparable to the real production data. It is not sufficient to make them “almost” real. They have to be the same or comparable to production in content and volume. How else, if at all, can you come close to guarantee that the application will not fail?

more»

The behavior of software can be unpredictable and problematic. Knowing that a program works wonderfully at 10 transactions per second does not confirm the reaction when it hits 150 transactions per second. Knowing it works when processing 1,000 records is not the same as when it processes 1,000,000 records. The restriction or problem may lie with the operating system, the middleware, the DBMS or the application, but the outcome is the same: downtime with loss of revenue and potential disaster. If you want to know whether your application will work in production, you must test it as if it is in production. Many people in charge of testing at this point cite the commandment “Thou shalt not test in production” for safety and risk consideration.

But nobody tests in production – that’s a give – but you can test the system to the level required by the production system.

When you look at the expense and requirements in the testing of new or updated applications there are some simple practices we all follow. Most companies will have test beds for developers and for some distinct test rationale. Deficiency in the process is most often encountered in two areas, first, the lack of repeated refreshes and data synchronization, and second, the deployment of   inefficient and expensive copy processes. Test groups lament about unsuitable test beds and refresh intervals that murder their test schedules. To understand the relevant issues in a test data landscape you need to review the cascade of test environments. The source of this cascade is production, the second level is pre-production (pre-prod or QA), from there data ‘flows’ or transfers to further test beds, normally masked, reduced and modified to satisfy specific test requirements. It is important to run a time controlled refresh process, because data must be copied to the succeeding levels before tests on the current level potentially spoil it.

Production data is most qualified for meaningful tests, because it contains all those special cases and the quantity structures that programs have to cope with. On the other hand each access to production poses risk in respect of data privacy and data security. Hence to appease auditors and risk managers it is pre-ordained that only a single, well-controlled, routinely executed process should access production data for the provision of test data. This regular pre-prod refresh must not impact production meaning that it must execute as fast as possible with minimal impact on production. When you require consistent point-in-time data there is always some impact on source databases. You need to ensure any such impact or pause is seconds rather than hours. The latter (hours) would probably confine execution to certain weekends or even render it impossible. The process to provide a pre-prod environment

  • must be efficient to be affordable for regular, frequent deployment
  • must be fast with minimal impact on production
  • must be robust to allow for automatic, scheduler-driven execution
  • must balance any structural changes to tables in the development cycle.

The last requirement sounds easier than it is. Testing of newer versions requires the new table structures. To minimize human error the copy tools should automatically alter and further adapt the production structures into the test environment.

Reliable periodic refreshs of a pre-prod environment uncouples production from any test activities and provides a base for test data provision to any number of test beds. The environment can also be used for final release and performance tests. The data of pre-prod environments are usually not masked – final tests not only require the full volume of data but also ‘the real thing’. This is usually not a problem because only confidants are involved with final tests. It means of course that there must be access limitations to the pre-prod environments. Only the confidants and the verified copy procedure that distributes the data from pre-prod to other test beds should have access.

The next level of the test data cascade, after pre-prod, form usually the integration test environments, where subsequent levels may be unit test environments, further special test beds for function testing, and finally test case data for development.

Integration environments conform to pre-prods, mostly, that is, masking is waived and all objects (tables, databases) of the application are present. Different with the other levels, unit test beds require only the data that the unit deals with, only tables that are accessed by the unit need to be present. From here the latest masking is required – if not already for integration tests.

Developers generally want the data further reduced where only certain rows of certain tables are desired in order to keep the test runs short. With masking and data reduction further questions come in to play. How is selection accomplished and how can the masking functions be formulated and selected. You must always ensure completeness – relational integrity and hidden dependencies. Development Managers apply huge effort for test case construction – to select consistently and extract the right data which may be further modified. An automated tool that eases this task and that enforces data privacy rules (masking) can hugely shorten test schedules. It reduces operational costs (CPU resources and manpower) accomplishing at the same time the copy and the data field modifications (for data protection) required by the test case.

Summary – Characterization of typical environments

 

Pre-production: one-to-one copy of production; data structure(s) modified to match the new structure(s); object rename possibly required (creator); field content unchanged, not masked; closed or sealed-off environment accessible for confidants only; periodically refreshed in appropriate intervals; robust, efficient and inexpensive “copy” procedure required which is preferably a scheduler-driven, automatic, unattended process.

Integration test environments: allow to test software modules and components as a whole and within application context; data structure(s) adapted to new version; sensitive fields are possibly masked (anonymized, pseudonymized, http://en. ­wiki ­pedia.org/wiki/Pseudonymization); field contents might be modified to satisfy certain test requirements; data is present in full volume (whole table, entire database); environment is to be refreshed periodically and frequently, normally copied from pre-prod, where procedure must work efficiently, must be inexpensive, automatic, scheduler-driven; object renaming (creator) and possible data modifications should accomplished within copy process.

Unit test environments: only some tables/databases are required, copied from pre-prod; masking of sensitive fields enforced; data fields possibly modified to satisfy test requirements; regular refresh should be fast and efficient; specification of select and modify functions should be easy.

Developer test beds – test case environments: masking must be enforced for test data copied from pre-prod where provision of test data for certain test cases can be challenging, e.g. pure copying of production-like data may not be sufficient because field contents needed are not yet available in production data – this implies that the basic data of pre-prod must be the easy way ‘enhanced’ to form the needed test case. Dates are to be aged or rejuvenated, numbers are to be increased, modified records are to be added, varied or duplicated; test case data is rarely bulk data – usually some hundreds of records with specific attributes is enough. Speed and efficiency are no longer the primary attributes of the copy tool – the requirements are now focused on expedience where usability and user guidance regarding the tool should be convenient, easy to handle and yet flexible enough to avoid programming for intricate cases.

In reality companies do have many more data copies than those described above where there might also be reference environments, hotfix environments, training environments, and lots more depending on internal policies. Irrespective, for all these different test and stage environments, you require a process and tool to efficiently create, refresh and maintain the test environments.