This article explores how XDM’s innovative approach allows organizations to generate complex synthetic datasets that seamlessly integrate with existing development processes. Discover how this powerful tool can revolutionize your test data strategy while enhancing software quality, security, and regulatory compliance.
Generate intelligent, realistic test data
The generation of synthetic test data is becoming increasingly important — especially in situations where data protection regulations or lack of access make it difficult to use production data. At the same time, the use of production data copies remains a proven and valuable part of many testing strategies — particularly when it comes to accurately representing real data patterns or complex relationships.
This is where XDM comes in: By leveraging artificial intelligence, XDM enables the creation of high-quality synthetic data that closely mirrors real data in structure and content — yet is fully anonymized and free of any personal information. This creates a flexible combination of data protection, quality, and efficiency in test data management.
XDM stands out for its versatility and scalability, making it capable of generating even complex datasets. It allows for the definition of arbitrary data structures and the automated generation of test datasets that can be seamlessly integrated into existing development and testing processes.
A particularly noteworthy feature is the ability to generate realistic data based on existing patterns or to simulate rare scenarios in a targeted way. This allows companies not only to ensure data protection but also to significantly enhance the quality and security of their software.
Solution: AI-generated synthetic test data with XDM
XDM uses artificial intelligence to generate synthetic but realistic test data. The process consists of the following steps:
1. Definition of the data structure
Initially, the data model of an application is defined. In this example, we use a simple application for managing employees and their departments. An employee, for instance, has attributes such as first name, last name, date of birth, or email address.
An example of the definition of an employee in XDM:
Defining the data model is a one-time step. Once XDM knows the structure, this model can be used, among other things, for generating test data.
2. Define Entity Generator
In the next step, entity generators are created that can generate records for the defined entity types. In our example, this is an entity generator for creating employee instances. Traditional approaches often require extensive programming to generate the data. Each attribute has to be manually filled with meaningful values.
XDM addresses this problem using a Large Language Model (LLM). The LLM is used to generate plausible records for an employee.
The following example shows how an LLM can be used in XDM to generate a realistic employee:
If production data already exists, XDM can analyze distributions and dependencies, and these statistics can be incorporated into the generation process to realistically replicate value ranges and relationships. By combining attribute descriptions with AI-driven generation, consistent and plausible data is created.
3. Create File Scenario
A file scenario defines the data storage format in which the generated data should be produced. For our example, we want to generate the records in CSV format.
4. Create and execute a Task
The final step is to create a task object that brings together the necessary components. In our example, we create a task that calls the entity generator and writes the generated employee records to a CSV file.
After the task is executed, the generated data is saved in a CSV file. The generated data for our example looks as follows:
By providing the business context, valuable datasets can be generated. This allows implications that depend on the software under test to be taken into account.
Customizations
If the data needs to match external systems or the uniqueness of key elements must be ensured, the generated records can be customized within the respective generator:
In the example above, an attribute generator is created for the employee number. This generator ensures that all generated employee numbers are greater than 600,000 and that the values are unique. This makes it easy to implement adjustments that an LLM cannot practically handle — since it lacks persistent memory of which values have already been generated.
Conclusion
With XDM, you can automatically generate high-quality, realistic, and GDPR-compliant test data. Thanks to the use of artificial intelligence, the effort required to implement the generators is minimal.
Take advantage of artificial intelligence to optimize your software testing and make your development processes more efficient.
Get started now – try XDM and revolutionize your test data strategy!