Things to consider when using DSN1COPY

Many Db2 shops require a faster alternative to UNLOAD/LOAD, and some of them are looking at the DSN1COPY utility. DSN1COPY copies Db2 objects on file system level, but it has no built-in automation, which is why people often write their own scripts to generate both the DSN1COPY jobs and the required control data sets. DSN1COPY only copies VSAM data sets and does nothing else, so additional programs are typically required to perform pre-copy and post-copy work, such as generating DDL, checking the compatibility of objects, stopping and starting objects, running Db2 utilities, and other things.

In many cases, the DSN1COPY processes run well for the set of objects for which they were originally written. However, the processes often need to be revisited if new requirements appear, if the environment changes, or if a new version of Db2 is installed.

Like DSN1COPY, BCV5 also copies Db2 table spaces and indexes on file system level. However, this alternative product provides unmatched flexibility in the object selection and rename process, takes care of DDL, handles differences in the source and target environments automatically, and has numerous built-in checks and copy methods so that it always “get the job done”.

The following list outlines some of the challenges that need to be tackled when implementing a copy process that is based on DSN1COPY; your existing scripts and processes probably handle some of these challenges already. Some of them may not seem important because you are not using a particular feature of Db2. But even things you can ignore at the moment may become important in the future as you introduce new processes and applications into your environment – either home grown or through a vendor product. That is why you should make sure that your existing processes are able to handle those situations.

Changes in requirements

  • You may be asked to create an additional test environment for a new group of testers. Can you easily modify your existing process so that they use different target names? Can you generate the DDL for the new target tables?

Why is this important?

Copy processes should be flexible with regards to the naming conventions of the target environment. It is very common to have not one, but multiple different test environments that share the same Db2 subsystem, but use different database names or schema names.

Changing the environment should be as easy as possible, and copy processes must also be able to create completely new environments if a new group of testers is added. This means that not only the data, but also the table structures need to be copied as well. BCV5 takes care of structures and data and can create new environments with the push of a button.

  • Can you copy from the source table spaces and also from source image copies? Can you check if valid image copies exist for all involved table space? Do you automatically rebuild those target indexes where the source indexes have no image copy?

Why is this important?

A common requirement is to use image copy data sets as input for a copy operation. Since image copies never change after they have been created, this allows you to repeat the copy multiple times with the same source data.

There are several things you need to consider when copying from image copies, though. Indexes may or may not have image copies, and you should only rebuild those target indexes for which no image copy exists in the source environment. If you have partitioned table spaces, you must check if your image copies are on partition level or on table space level, and adjust the copy process accordingly.

Picking the correct image copy data sets can be more difficult that it sounds, especially if you don’t want to use the newest image copy, but one that was taken several days or weeks ago. BCV5 allows you to specify the desired generation number or a time stamp, and chooses the correct image copy data sets automatically.

  • When copying from image copy data sets: Can you detect if the table has been altered after the image copy was created?

Why is this important?

The Db2 catalog keeps track of the current structures of all tables, but you may want to copy from older image copy data sets. If a table was altered since the last image copy, the structure in the catalog does not reflect the contents of the image copy data set. You need to make sure that you only use image copies that were created after the most recent ALTER TABLE and the following REORG has been done. BCV5 checks if the structure of all involved image copies still match the table descriptions from the Db2 catalog and alerts you if there has been a structural change.

  • Can you easily include more objects into your process without having to modify any existing jobs?

Why is this important?

Updates to an application may add more tables or indexes to an existing source environment. It can be tedious to compare the copy jobs against the modified structures and to add all new objects manually. Object selection should use patterns with wild cards, so that you can easily select hundreds of objects with a single rule. The copy process should then allocate the required data sets dynamically so that you don’t need to modify your existing jobs or add new jobs if you want to include more objects.

  • Are your jobs scheduler-friendly? Ideally, the JCL should never change so that you can copy it into a scheduler library and never worry about it again.

Why is this important?

Copy processes are often executed by a job scheduler. Schedulers work best if you have a predefined set of jobs, and both the number of jobs and the contents of the jobs should not change after the initial setup. If you add or remove jobs, you need to update the scheduler configuration. If you change the contents of jobs, you need to refresh the jobs in the scheduler library every time something changes.

This added complexity can be avoided with BCV5. The job chain that is created by BCV5 does not change when objects in the source or target environment change, or when you change the selection criteria or rename rules of the copy process.

Changes in the source or target environment

  • Can you detect if a source object is in a restricted state?

Why is this important?

The source objects for a copy operation should be in a good state. Db2 can restrict access to an object if there is a problem. This is called a “restricted state”. Sometimes, a restricted state is irrelevant for making a copy – such as the copy pending state, which means that the object has no usable backup – but sometimes it is very important and needs to be resolved.

For example, when an object is in check pending state, Db2 cannot guarantee relational consistency and making a copy might result in inconsistent data in the target tables. Then there are things like the recover pending state, which indicates that an object is broken and you must resolve this state before making a copy. BCV5 checks for relevant restricted states in the source environment and stops the copy process at a very early stage so that you don’t waste time and resources.

  • If you stop the source table spaces: can you detect if they are really stopped, or if the stop command is still pending?

Why is this important?

If you decide to stop the source objects during the copy, you normally execute a STOP DATABASE command. This command requests that the object be stopped, but it runs asynchronously. If there are any open transactions that are using the object, it will not be stopped immediately. Instead, it is placed in stop pending mode, and you must wait for the object to stop before you can make a consistent copy. BCV5 has the option to verify that all objects have been stopped properly before the data is moved.

  • If you don’t stop the source table spaces, can you guarantee that the copy will be consistent?

Why is this important?

You may not be able to stop the source environment for a copy operation because users and applications require read and write access. Making a consistent copy under these circumstances can be a complicated process that consumes a lot of time and resources. BCV5 has a log apply component1 that uses the log records from the source objects in order to make the target objects consistent. This works for all types of table spaces and indexes and allows you to make consistent copies without any impact on the availability of your source objects.

  • Can you handle new objects (for example, new indexes) and dropped objects in the source without having to modify your process?

Why is this important?

The copy process should always examine the Db2 catalog in order to check if any objects have been added or removed, or if objects have changed. When new objects are created, the copy process should include them if they match the selection criteria. This includes creating the new objects in the target environment if required.

Even if objects have not changed logically, there are several occasions where the physical representation of your objects change. If you reorganize an object, the corresponding data sets may change names. Db2 may add or remove source VSAM data sets if a lot of data is added, or if an object is reorganized after data has been deleted from tables. With BCV5, you get a robust copy process that takes this into consideration and adapt without any changes to the JCL.

  • Do you compare the source and target attributes of all involved table spaces, tables, and indexes in order to make sure they are 100% compatible?

Why is this important?

The usage of DSN1COPY requires careful preparation. You need to make sure that all attributes of your table spaces, tables and indexes match perfectly. This includes, but is not limited to, the page size, segment size, encoding scheme, number of partitions and partition limit keys of your table spaces, the number of columns, order or columns, data types, lengths, encoding schemes, and NULL-attributes of your tables, and the number and order of columns, page size, sort order and unique attributes of your indexes.

If something is not 100% compatible, the DSN1COPY utility will not produce error messages, but the resulting target objects may be unusable. Problems may be subtle – sometimes a target object seems to work until you access a particular row. An automatic compatibility check, which is included in BCV5, is key to making a successful copy on file system level.

  • When you detect a compatibility problem, can you resolve it automatically by either using UNLOAD/LOAD for the object, or by dropping and recreating the target object (only the problematic target object, not the entire target environment!) with the same attributes as the source object? Can you automatically recreate all objects that were dropped (for example, when dropping one single table, Db2 will also drop all associated indexes, foreign key constraints, and GRANTs)?

Why is this important?

When source and target objects are not 100% compatible, you have two options. Option one is to use UNLOAD/LOAD to copy the data, which is a lot slower than DSN1COPY, but can handle certain structural differences. Option two is to drop the target object and recreate it with the same attributes as the corresponding source object. The challenge here is that Db2 objects have depending objects. After dropping a single table, you may have to recreate not only that table, but also dozens of other objects. This includes indexes, referential integrity constraints to other tables, access authorization, related LOB table spaces, auxiliary tables and indexes, clones, aliases, and other objects.

If your process includes a mechanism to generate DDL, it may be tempting to drop and recreate all target objects unconditionally. However, this may take a lot of time and it is not necessary if the objects have the same attributes as in the source. BCV5 gives you different options to handle structural differences in the target environment, and it can be configured so that only incompatible objects are dropped and recreated.

  • Can you detect if additional indexes have been created in the target (for example, by a tester) and rebuild them automatically after copying the associated table space?

Why is this important?

Testers sometimes add indexes in order to improve the performance. Even if the application that is being tested does not make use of these new indexes, it is possible that tests include verification steps that run much faster with additional indexes. When you copy a table space, you need to make sure that all associated target indexes are refreshed as well – either by copying the source index, or by rebuilding the target index.

For indexes that only exist in the target environment, the only option is to rebuild them. At the same time, it is time consuming to rebuild all indexes. BCV5 detects additional indexes in the target environment and only rebuilds those indexes for which no corresponding source index exists.

  • Can you detect if a source object has grown considerably in size, and re-allocate the target VSAM cluster large enough?

Why is this important?

DSN1COPY requires you to allocate the target VSAM clusters. Since table spaces can grow and shrink in size, it is important to ensure that the target VSAM clusters have the correct sizes before copying the data. If a VSAM cluster is allocated too small, the copy process might fail. If it is allocated too large, space is wasted. BCV5 determines the correct size by examining the source ICF catalog, which, unlike the Db2 catalog, always contains an accurate size for every source VSAM cluster.

  • Can you handle multi-piece table spaces properly?

Why is this important?

Non-partitioned table spaces (i.e., simple and segmented table spaces) and indexes can be represented by multiple VSAM clusters. Db2 splits the logical object into multiple physical parts (usually at 2 GB or 4 GB boundaries). This is not directly visible in the Db2 catalog. You need to detect how many pieces currently exist in source and target, allocate missing target VSAM clusters, delete superfluous target VSAM clusters, and make sure you copy every existing source VSAM cluster correctly. When copying from image copy, the image copy may be one big data set instead of multiple smaller data sets, so you must make sure to switch to the next target VSAM cluster at the same boundary that Db2 uses. BCV5 automatically handles this situation.

Support for modern Db2 features

  • Can you handle versioning properly (detect if columns have been added or changed, detect if the required REORG has been done, update versioning information in the target Db2 catalog after the copy)?

Why is this important?

Adding columns to an existing table or changing the data type of an existing column introduces versioning. Db2 does not physically add the column until the next time the table space is reorganized, but the column already exists logically. The source Db2 knows that the table used to have a different structure and can handle “old” rows automatically.

When you copy the table space using DSN1COPY, the target Db2 subsystem may not how to interpret “old” rows correctly. This results in errors when the target table space is accessed. Therefore, when a table space is copied on file system level and one of its tables has been altered, it is important to reorganize the table space before copying it. BCV5 automatically checks for table spaces that require a reorganization because of added or changed columns, and can generate the required LISTDEF statements that you can use in a REORG job. In addition, at the end of the copy process, BCV5 adjusts the versioning information in the target Db2 catalog so that it matches the contents of the target VSAM clusters.

  • Can you handle partition-by-growth table spaces? Do you detect if Db2 adds new partitions to the source, and automatically add them in the target?

Why is this important?

When a partition-by-growth universal table space grows in size, Db2 can add new partitions dynamically. This results in an additional VSAM cluster for the table space, and it is reflected in the Db2 catalog as well. When copying such a table space, you must make sure that the target Db2 catalog contains the correct number of partitions – otherwise, the target Db2 subsystem cannot access the most recently added partitions, even if the target VSAM clusters exist. BCV5 automatically adjusts the value in the target Db2 catalog so that it reflects the actual number of partitions.

  • Can you handle XML table spaces, including XML string ID translation?

Why is this important?

XML table spaces use numbers instead of strings for the tags in an XML document, which saves space and allows for easier indexing of tags. These numbers are called XML string IDs, and they are assigned automatically whenever a Db2 subsystem encounters a new XML tag. The XML string IDs in the source Db2 subsystem can differ from those in the target Db2 subsystem, which is why they must be translated during the copy. BCV5 handles this translation automatically. DSN1COPY does not support XML string ID translation.

  • Can you handle clone tables correctly?

Why is this important?

When you have clone tables, Db2 creates another instance of the table space and creates a new VSAM cluster for it. In order to copy the correct VSAM cluster, you need to determine which instance is currently active. In addition, during the copy, the hi-order bit in several fields inside the table space has to be adjusted so that it correctly reflects the target instance number. This adjustment cannot be done by DSN1COPY, but BCV5 will handle it automatically

  • Can you handle temporal tables and archive tables correctly?

Why is this important?

Temporal tables and archive tables are relatively new Db2 features. Although not used excessively, there is always a chance that new applications – developed in-house or purchased from a vendor – make use of those features, and in order to properly test these applications, the temporal and archive tables need to be copied as well. BCV5 supports these objects.

Other advantages

  • Can you easily select the objects to be copied by specifying a database name or a schema name?

Why is this important?

When setting up new processes, you often need to copy all tables under a given schema, or all tables that are in a given database. Depending on the number of tables and indexes that are involved, it can take a considerable amount of time to manually created the required jobs and control statements. BCV5 simplifies the selection process by allowing you to specify rules with wild cards. These rules are evaluated every time the BCV5 job chain runs, so your list of objects is always up to date.

  • Can you rename the objects in the target?

Why is this important?

Test environments often use different naming conventions, therefore it is important that you have an easy mechanism to rename object names, schema names, etc. If you want to have two or more copies of the same environment in the same Db2 subsystem, renaming the objects becomes mandatory to avoid naming conflicts. BCV5 allows you to rename virtually all attributes of all objects, including the SQLID that is used to create new objects, by specfying a set of rules. The new names are reflected in the generated DDL, and they are taken into consideration when determining the names of the target VSAM clusters.

  • Can you copy both RUNSTATS and RTS information from source to target?

Why is this important?

The Db2 optimizer relies on RUNSTATS information in order to determine the best access path when queries are executed. While you can execute the RUNSTATS utility in the target environment after making the copy, it is much faster to copy the required information directly from the source Db2 catalog to the target Db2 catalog, which is what BCV5 does. In addition, BCV5 also copies the real time statistics (RTS). Those statistics are important for the dynamic allocation of sort work data sets when you run maintenance jobs in the target environment.

  • Does your process have automatic load balancing? Can you easily adjust the number of parallel threads?

Why is this important?

When designing a job chain for DSN1COPY that uses parallelism, you should distribute the objects among the parallel jobs so that every job takes approximately the same amount of time to execute. Failure to do so can result in a situation where one job takes a very long time to complete while all the other jobs finish quickly. This delays the entire process. In addition, objects can change in size over time, which is why you regularly need to revisit the jobs and make adjustments. BVC5 uses one single copy job, but spawns a user-definable number of thread inside the job. All threads take objects from the same object pool, which results in automatic load balancing and threads usually finish in approximately the same time.

  • Do you automatically alter identity columns and sequence objects so that INSERT operations in the target environment will not try to use duplicate key values?

Why is this important?

When inserting new rows into the target tables, Db2 can automatically generate unique numbers for identity columns, and you can also get unique numbers from sequences. These number are often used in key columns. After copying your tables, you need to make sure that the next number that an identity column or sequence will generate is larger than the current largest number that exists in the table. Failure to do so may cause duplicate key errors when you insert new rows. BCV5 automatically adjust these values for identity columns, sequences, and implicit sequences that Db2 creates for XML columns.

  • Do you have support for creating the following target objects if required: views (including views on views), aliases, synonyms, triggers, grants, stored procedures, packages, comments, labels, referential integrity constraints, check constraints, materialized query tables, user defined functions, and Db2 storage groups?

Why is this important?

Applications sometimes do not use the tables directly, but refer to views, aliases or synonyms instead. If you do not generate these objects in the target, the application cannot run even though all required tables exist. Similarly, the other object types are often required so that your target environment works exactly like your source environment. BCV5 can automatically generate DDL for all these object types, and you can choose which ones you need and which ones you don’t need.

  • Can you easily copy between two different LPARs, even if there is no shared DASD pool available?

Why is this important?

Test environments need to be separated from production, and one way to achieve this is by putting the test environments on a different LPAR. This ensures that test jobs do not influence the performance of production jobs. When the test environment is separated in a way that no shared DASD pool exists between the production and test LPAR, copying data can become a challenge. BCV5 is able to copy data over a TCP/IP connection, which enables you to make copies across LPAR boundaries without the need for temporary data sets.

  • If a job should fail, can you continue the process without having to start over completely?

Why is this important?

When a problem occurs while copying a large amount of data, you should be able to continue the copy process where it stopped after correcting the cause of the problem (for example, an authorization problem). Having to start over and copy all objects again is a waste of time and resources. BCV5 automatically keeps track of which objects have been copied successfully, and it allows you to re-submit the copy job without any changes to the JCL. It will detect that it is running in “restart mode” and only copy those objects that could not be copied in the previous attempt. This mechanism works for all jobs that are part of the BCV5 job chain – you never need to modify the JCL if you need to submit a job twice.

Organizational challenges

  • When a job fails, is there more than one person that understands and can fix the process?

Why is this important?

Being able to read and interpret messages from failed jobs is crucial when the process runs into problems because you want to be able to continue as soon as possible. If only one person in your organization is able to get the copy process back on track, you may suffer delays in your data provisioning process. This can be avoided by using a vendor product like BCV5, where you have a support team available that can help you diagnosing and correcting problems.

  • Can your DSN1COPY processes be maintained by people other than the original author if features need to be added?

Why is this important?

If the existing processes need to be modified in order to support new features, or in order to correct problems, it is important that there is more than just one person in your organization that is knowledgeable enough to do so. The original author of the involved jobs and scripts may no longer be available, or may work on other important projects. This can lead to situations where required fixes cannot be made, and you suffer from delays and unavailable test data. Since BCV5 is a vendor product, there is an entire company dedicated to support and development. This helps you getting support for new requirements much faster and in a more reliable way.

  • Can you quickly adapt your processes to the next version of Db2?

Why is this important?

With every Db2 version, IBM introduces new features, deprecates old features, and may change the internal structure of your objects. You may need to modify your existing processes so that they support the new features and check for any new compatibility problems that may arise. This can be a complex and time-consuming process. BCV5 is maintained and updates by UBS Hainer GmbH. We take care of updating the software and testing it against all supported releases of Db2, and all possible combinations of source Db2 version and target Db2 version.

1The log apply component is a priced feature of BCV5.