Archiving Data

Keeping your growing tables small and efficient is important for performance and maintenance. Smaller tables and indexes are accessed faster and improve performance. Backups for smaller tables complete faster and require less space. REORG can purge old data using DISCARD conditions. This way you can trim down critical tables and improve access performance. The purged data is written to sequential files that can be used to load the data into history tables or reload back to the discarded tables.

Purging Programs
Prior to Db2 version 8, you could run REORG DISCARD in SHRLEVEL NONE only. Prolonged execution affected availability and failures affected recoverability. Incorrect restart could have been resulted in loss of data. For this reason, administrators prohibited the usage of REORG DISCARD but also avoided purging data using their own authority to protect themselves from being blamed for accidental deletion of business critical data. In many cases, developers were instructed to write programs to delete rows. The programs had to commit the work every few minutes to release the locks and allow restart from the most recent commit point. These programming requirements made the development of purging programs complex. In addition, purging programs using SQL to delete rows, also impact logging because every deleted row is recorded in the Db2 log for recovery purposes. REORG DISCARD, on the other hand, allows you to use the LOG NO option and avoid the logging overhead. More important is that if data is deleted accidently, you can easily reload back the discarded data using the discard datasets created by REORG DISCARD.

Preparing Discard Conditions
Administrators can have better control over data purging if they allow users to predefine discard conditions and trigger the discard process when needed but leave the operability of REORG jobs in the hands of the administrators. This can be achieved by managing a repository of discard conditions accessible to users where they can define how the data should be pruned. Discard conditions are somewhat limited because you cannot use expressions or sub-queries. It is possible to extend discard conditions predicates by using a query that is executed prior to REORG DISCARD. The result set can be used to generate the desired discard conditions. Finally the REORG statement with the desired discard conditions should be prepared before execution. The users should be able to trigger the execution of the discard process under the supervision of administrators. This way, if the process fails, administrators are still responsible to restart the process or recover the data.


Recording Discard Datasets
Discard datasets should also be recorded in a repository to allow you to easily recall discarded data. The datasets names can be identified by analyzing the output of the REORG utility. However, the output might be lost if the job is cancelled or when the job abends. The output might also be partial after restart. However, you can use the templates definitions to identify the datasets by resolving the symbols for all discarded objects and identifying the last dataset allocated by the template in use. This way there is no need to analyze or capture the output of the reorg step. Recording the datasets in a repository allows you to quickly identify all the discard datasets that were used to discard the data and generate LOAD jobs to load the data into history tables for archiving or reload back to the discarded tables for recall.

Discarding Dependent Rows
When discarding data from a parent table, you cannot always use the same discard conditions to remove the dependent rows from the dependent tables. To do that you need to first discard the parent rows using REORG DISCARD and then use the CHECK DATA utility with the DELETE YES option to remove the dependent rows from the dependent tables. CHECK DELETE required you to create a matching exception table for all dependent tables. The exception table should be created exactly like the dependent table and the additional RID and TIMESTAMP columns can be added. The exception table must be also created in a tablespace with the same page size and attributes as the dependent table tablespace. The primary allocation size of the exceptions tablespace should accommodate for at least 20 percent of the rows in the dependent table. The Exception tables are dropped automatically when empty or when expired. The CHECK DELETE statements should be generated before reorg discard and executed after reorg discard. In addition, if some dependent tables are also purged using REORG DISCARD, you can avoid running CHECK DELETE but you might need to use the REPAIR SET NOCHECKPEND to remove the restricted state. While REORG DISCARD can create an inline copy, CHECK DELETE YES LOG NO might leave the tablespace in COPY PENDING, and an image copy is required to ensure recoverability for dependent tables.