Postgres Delete/Restore by Table Properties
Using triggers, key generators and a second table for delete/restore functionality
Table Of Contents
Today I Explained
You’ve likely encountered tables where the records within the table are treated as immutable, creating entirely new entries when a change is made. These are versioned tables, and are one way to provide a versioning scheme within a database like Postgres. Typically this is implemented within the table itself, by having a table in which all versions exist, using a VersionID
(or similar ID) to distinguish between the versions:
SELECT ID, VersionID, Name, ....
FROM table;
This does work, and has the added benefit of keeping foreign keys simple, as you preserve the original relationships between tables within the database. The challenge with this, is that we are now complicating the querying logic of everything related to this table, as it must now handle the addressing of $Latest
for each object contained within the table. This can be especially painful when trying to do joins across multiple tables, and needing to factor in the most recent state of the world.
One of the alternative approaches that can arise with this is the adoption of a multi-table system to facilitate versioning. A primary table (${TABLE}
) that contains the latest state of the world, with the most recent version of each object contained within the table. Any modifications to this table will, such as UPDATE
or DELETE
operations, see copies of each record published into a versioning table (${TABLE}_history
). The generation of these tables & related triggers can be handled by annotating models within the database with [VersionedTable]
or [VersioningBehaviour(Capabilities.SINGLE)]
.
An update will see the previous record copied to the versioning table, before the update replaces it. The similar behaviour can happen with a delete operation, or alternatively the delete can prune all resources from the versioning table with it. As this is handled by triggers, the computation of the VersionID
needs only take into account the most recent version, followed by a generation of a new identifier.
It is no longer necessary to have involved querying logic for the primary table, and the versioned table can support its own interface that takes into consideration the historical nature of the table.
A note on storage
One of the pain points with this approach, is that you are now potentially storing a significant amount of extra data with each modification of the history. A single to a single number field within a record, means that all other properties (e.g. large text) have a duplicate entry in the database. This is just a general pain to deal with.
An option that may assist with this is working with a column-specific trigger, which will fire when any of its columns are listed as targets in the UPDATE
command’s SET
list. Although, since this is working with SQL code generators, it may be an option to simply generate stored procedures that handle this as-needed column versioning.
A note on Version Normal Form
Version Normal Form is a way of structing the tables in which static (unchanging data) is stored in a top level non-volatile table. This can look something like:
-- Omitting types for brevity
CREATE TABLE Records(
ID -- ...
StaticField1 -- Static data
-- ...
StaticFieldN -- Static data
)
The volatile attributes are then set within a separate table known as RecordsVersions
, which will need to be tracked over the lifetime.
-- Omitting types for brevity
CREATE TABLE RecordsVersions(
ID -- ...
Effective -- The effective data of the versions (Combination of ID & Effective are unique)
-- ...
VolatileField1 -- Volatile data
-- ...
VolatileFieldN -- Volatile data
)
This then has the queries use both tables, to determine the effective version ($Latest
) using the ID
and Effective
date.