Copying Db2 objects
with FlashCopy

Copying Db2 tablespaces and indexes on file system level is fast and reduces CPU consumption when compared to traditional methods of copying Db2 data, such as Unload/Load. Db2 ships with the DSN1COPY tool, which operates on the VSAM clusters that Db2 uses to store data. Vendor solutions, such as BCV5, use the same basic idea as DSN1COPY but take it to the next level by adding flexible selection and renaming mechanisms and provide a scheduler-friendly job chain for copying an arbitrary number of objects with unmatched flexibility. Tools like DSN1COPY, BCV5, and other vendor tools usually access the VSAM clusters using standard access methods provided by z/OS. While this is already about 10 times as fast as Unload/Load based solutions, it might seem tempting to use a different approach to copying VSAM clusters: hardware assisted data set level copies.

The promise of hardware assisted copy tools

Hardware assisted copy tools are most often used to make point-in-time copies of a volume or of a group of volumes. One of the most well-known tools is probably FlashCopy from IBM. Tools from other storage solution vendors work in a similar way – and thus have the same restrictions – but use different names, such as Dell EMC Snap, Hitachi ShadowImage, and others. FlashCopy V2 supports copying data sets (or extents in FlashCopy terminology) in addition to entire volumes, and it also supports track relocation. This makes it possible to make a copy even if the source and target VSAM clusters do not occupy the exact same tracks on their respective volumes. When copying Db2 tablespaces and indexes, data set level FlashCopy may seem like a good fit. Usually, FlashCopy is used in combination with ADRDSSU because FlashCopy itself only copies the physical raw data. It does not perform any data management functions, such as defining or cataloging the target VSAM clusters.

The idea behind FlashCopy is that you establish a relationship between the tracks on the source and target volumes that are occupied by the source and target VSAM clusters. This group of tracks usually sits in multiple adjacent areas of your volumes. These areas are called track sets, which means that a copy operation will work on multiple track sets. FlashCopy then operates in the background and copies the specified tracks from the source volumes to the target volumes. After copying a track set, the relationship is automatically released.

What happens behind the curtains when FlashCopy makes copies of data sets is not trivial, but a simplified explanation looks like this:

For each source track, the following rules apply:

  • Reading, either before or after the physical copy, always returns the current source data.
  • Writing before it has been copied causes FlashCopy to copy the track synchronously, then release the relationship and write to the source track. This ensures that the target track reflects the data before the change, which is important for a point-in-time copy.
  • Writing after it has been copied modifies the source track right away without any consequences to the target (because the relationship has already been released).

For each target track, the following rules apply:

  • Reading before it has been copied returns data from the corresponding source track. Think of the target track as a pointer to the source track.
  • Reading after it has been copied returns data from the actual target track.
  • Writing before it has been copied causes FlashCopy to copy the track synchronously, then release the relationship and write to the target track.
  • Writing after it has been copied modifies the target track right away.

The logical copy is instantaneous and you can work with both the source and the target data sets as if they were separate, even if FlashCopy is still busy moving data in the background. Eventually, the
physical copy will complete, but you are free to read from and write to both source and target at any time.

What you get is an exact 1:1 copy of your source data sets. Every target data set will look identical to its corresponding source data set at the point in time when the copy was triggered. This fact is very important because it is the main reason that FlashCopy is not an ideal choice for copying Db2 tablespaces and indexes.

more»

Structural considerations

Obviously, for a file system level copy of a Db2 object to work, the structures of the associated tablespaces, tables and indexes must be 100% identical in source and target. This means that for tablespaces, the page size, DSSIZE, number of partitions, encoding scheme, MEMBER CLUSTER attribute, hash organization attribute and the type of tablespace (simple, segmented, partitioned, universal PBR, universal PBG, universal PBR2) must match. For all involved tables, the number of columns, data types, column lengths, NULL attributes, encoding, edit and field procedures, and order of columns must match. For indexes, the number of columns, order of columns, unique rule, and ascending / descending attributes must match. FlashCopy does not check these attributes – that is not its job. You must use different tools that specialize in DDL generation and replication for this purpose.

One important aspect is XML data. Db2 uses XML tablespaces that reference string constants in the DB2 catalog. If the source and target Db2 are two different subsystems, it is usually impossible to copy an XML tablespace on file system level, regardless of whether you use DSN1COPY or FlashCopy. The exception to this rule is BCV5, which can translate the additional references found in an XML tablespace and therefore allows for fast filesystem level copies of XML data between different Db2 subsystems.

Keep in mind that tables can be altered, and that altering a table introduces versioning. Having more than one active version in a tablespace that you intent to copy on file system level is asking for trouble. You must reorganize your source tablespaces if they contain tables that have been altered, i.e., tables where columns were added or changed. In addition, you absolutely must run the REPAIR CATALOG utility (in older versions of Db2 it was called REPAIR VERSIONS) after the copy process on all involved target objects. This utility synchronizes the versioning information in the Db2 catalog with the information in the page set. It also adjusts catalog entries referring to the row format and RBA format of a page set. The Db2 Utility Guide and Reference contains a DSN1COPY process when versioning is involved, and this description includes running the REORG and REPAIR CATALOG utilities. The procedure is no different if you use FlashCopy.

What you don’t get

Even when all the structures match between source and target, a point in time copy of the VSAM clusters that belong to your Db2 tablespaces and indexes is only half the battle. Unfortunately, a Db2 page set is not an autonomous object. It has dependencies on information in the Db2 catalog and also on the Db2 log. Here are some of these dependencies (the list is by no means exhaustive):

  • In standalone Db2 subsystems, every page of every tablespace and every index contains a so-called log RBA. This is either a 6-byte or 10-byte value that indicates the log RBA when the page was last updated. In data sharing systems, an LRSN value is used instead of an RBA.
  • The header page of each tablespace contains its DBID and PSID. In addition, it contains the OBID of the table that is in the tablespace, if it is the only table in the tablespace.
  • In a segmented tablespace (this covers “normal” segmented tablespaces, but also range-partitioned universal tablespaces and partition-by-growth universal tablespaces), each space map page contains a list of segments, and each segment entry contains either the OBID of a table inside the tablespace (if it is a data segment) or the PSID of the tablespace (if it is a dictionary segment).
  • Compression and decompression dictionaries contain the log RBA of the point in time when they were created
  • The header page of each indexspace contains its DBID and ISOBID. It also contains the OBID of the table on which the index is defined.
  • Each index data page contains the OBID of the index.
  • Index directory pages also contain both the ISOBID and the OBID of the index.

The log RBA in each page, as well as the RBA in the dictionary (if present), cannot be higher than the current log RBA of the target Db2 subsystem. Obviously, this is not a problem when copying objects within one Db2 subsystem. But when copying from one Db2 subsystem to another, the target will refuse to accept the objects if its log RBA is not high enough. To solve this problem, there are two approaches:

1. Reset the log RBA in each page to zero. FlashCopy cannot do this because it always makes a 1:1 copy. Tools that do logical processing, such as DSN1COPY or BCV5, can reset this value.

2. Increase the log RBA of the target Db2 subsystem by creating a conditional restart record. You will need to cycle the target Db2. Keep in mind that when the source Db2 is a production, its log RBA is likely to increase faster than the log RBA of the target Db2 (which may be a test system), so you must update the log RBA of the target Db2 every time you make a new copy.

In a data sharing group, it may seem as if the log LRSN in each page is never a problem since it is time-based, but keep in mind that Db2 data sharing groups may use an STCK-to-LRSN offset. If this offset is non-zero, the same problems as in standalone subsystem can occur.

All other pieces of information need to match the corresponding values in the target Db2 catalog. Obviously, if the objects in the target Db2 subsystem have the exact same DBID, PSID and OBID values as their corresponding source objects, the values will match. But it is usually very difficult to guarantee that the IDs will be the same. Often, some or all the IDs will differ, which means that the corresponding fields in the target tablespaces and indexes need to be adjusted.

Sometimes, we see Db2 shops using FlashCopy followed by the REPAIR utility. The REPAIR utility can update single bytes in a page of a Db2 object. In this scenario, it is often used to inject the correct target DBID and PSID into the header pages of each target VSAM clusters. This way, only the OBIDs must match, which is a bit easier to accomplish. However, using REPAIR to update the header page is a flawed solution because the PSID of a tablespace can appear in segmented space map pages as well. Indexes have their own OBID in each data page, but Db2 often tolerates if this OBID is wrong. But there is no guarantee that at some point in the future, Db2 won’t look at that wrong OBID in the index data page and consider the index broken.

If the OBIDs of the involved tables do not match, things get much more complicated. The OBIDs show up in all segmented space map pages, and, if the tablespace is not a universal tablespace, they also show up in each record on each data page. This means that there can be millions of occurrences of the OBID values inside the tablespace, and they all need to be updated or Db2 will not accept the target object. With the introduction of universal tablespaces, Db2 no longer uses the OBID field in the header of each record – theoretically. Due to an issue with the REORG utility, it is possible that universal tablespaces still contain the OBID in some records. APAR PM55734 addresses this problem, and you may need to deal with it when copying objects.

The fact that segmented space map pages contain OBIDs is pretty much the deal breaker. It means that if you cannot guarantee that all OBIDs are 100% identical in source and target, the target object will not be usable after a FlashCopy. You would have to post-process every target object in its entirety and change hundreds or thousands of occurrences of the OBIDs in the space maps alone, and – if you are not exclusively using universal tablespaces – in every single data page as well. This takes a considerable amount of wall clock time and CPU time and renders the benefits of a fast point-in-time copy useless.

A word about consistency

One common assumption of using FlashCopy is that it will automatically give you a consistent copy of all involved objects. This is not the case. You only get consistency if you stop the source objects for write access for that moment where you trigger the copy operation, and if you also make sure that Db2 flushes its bufferpools to DASD before FlashCopy starts.
The reason is that a point-in-time copy does not guarantee that there are no ongoing transactions on your source objects at the very point in time where the logical copy is made. This means that if a transaction makes two UPDATES, FlashCopy might copy the object while the first UPDATE was already done but the second one was not. It is also possible that an INSERT operation has already modified the tablespace, but not yet updated all related indexes when FlashCopy strikes.
The result of these inconsistencies cannot be resolved in the target because the target Db2 does not have any information about the ongoing source transactions in its log. In fact, the target Db2 log for the objects becomes completely useless since file system level copy operation is essentially a non-recoverable event (NRE) because there is no information in the Db2 log or the SYSIBM.SYSCOPY table that would indicate the data in the target table spaces and indexes has changed. Therefore, after the copy, the RECOVER utility will not be able to process your target objects until you create a new full image copy.

When you do not resolve inconsistencies in the target environment, numerous things can happen:

Queries that use indexes get different results than queries that do not use indexes

  • LOB or XML values may not be accessible
  • Referential integrity constraints are broken
  • Relationships between pointer records and overflow records are broken (resulting in Db2 error 00C90207)
  • Other Db2 data manager errors are possible

You can use REORG TABLESPACE on all target objects to make sure that broken pointer records are fixed, and that all indexes match the data in their tables (because REORG will rebuild the indexes), and you can use CHECK DATA to ensure referential consistency. Both utilities can take a considerable amount of time and use a lot of CPU, especially when dealing with a large amount of data.

Bottom line

FlashCopy always makes 1:1 copies, but Db2 page sets have dependencies on the Db2 catalog and log. After making a 1:1 copy, you only have very limited capabilities to change values (DBID, PSID) in the target objects by using the REPAIR utility. Therefore, using FlashCopy to copy Db2 tablespaces and indexes is only feasible if:

the OBIDs of all source and target tables of a tablespace are equal

  • if the current log RBA of the source subsystem is less or equal to the RBA of the target subsystem
  • the current version of the source tablespace (SYSTABLESPACE.CURRENT_VERSION) is greater or equal to the current version of the target tablespace
  • the tablespace attributes SEGSIZE, PGSIZE, ENCODING_SCHEME, DSSIZE and PARTITIONS of source and target are equal
  • the structures of the involved tables of source and target are equal.
  • the respective tablespace is not a XML tablespace
  • the tablespace contains a table that has been dropped in the past and no REORG has been done after that action (so the tablespace may still contain records of the dropped table)
  • you copy from a data sharing Db2 to a data sharing Db2 or from a non data sharing Db2 to a non data sharing Db2

In almost all cases, it is impossible to fulfill all these requirements, particularly when copying many objects. Therefore, using hardware assisted copy tools, such as FlashCopy, are not a good choice for copying Db2 data. FlashCopy is fast, but can only make 1:1 copies. What you really need is a tool that is aware of the internal structure of Db2 page sets and can adjust fields that depend on the Db2 catalog and log. UBS Hainer offers BCV5, which provides unmatched flexibility in the object selection process, allows you to rename objects, copy page sets in a consistent way without restricting write access to the source tables, and comes with a fast data set level copy engine that will make the required modifications to the page sets. The BCV5 process is scheduler friendly and automatically adapts to changing structure in source or target, making it the ideal tool to copy Db2 data.

Disclaimer
FlashCopy is a trademark of IBM Corporation. ShadowImage is a trademark of Hitachi Data Systems Corporation. Dell EMC is a trademark of Dell Inc. All other trademarks are the property of their respective owners.