GnuCash  5.6-118-gf3b203f4db+
Globally Unique Identifiers: Design Issues
               Linas Vepstas November 2003

API: GUID

Summary

GUID's are meant to uniquely identify a GnuCash object. However, when that object is backed up (by copying a gnucash data file), the identified object is no longer unique. Book closing is a formalized means of saving older data. If a GnuCash data set is large, and SQL is the storage backend, then making copies of old data can be very wasteful of storage space. However, book closing, by definition, must retain a copy of the data 'as it once was', as an (auditable) historical record. This document debates different alternate schemes resolving the uniqueness, searchability, copies and storage-space issues that come up when handling GUID's on book closing.

Introduction

GUID's are meant to be a way of identifying a given GnuCash entity. Accounts, transactions, splits, prices and lots all have GUID's. GUID's can be used as a reference: by knowing a GUID, the matching entity can be found. Because GUID's are 128 bits long, one could have a billion different GnuCash users in a million different solar systems without worrying about accidentally assigning the same GUID to two different objects. So, given a GUID, one should be able to come up with the object uniquely identified by it, right?

One practical problem is that backup copies of files containing GnuCash data will have the same GUID's as the 'live data'. Because the user may have modified the object, if one looked at the backup copy, one would find more-or-less the same object, but it might have different values (for the amount, the value, the date, the title, etc.) Thus, a single value for a GUID can be associated with several different but similar 'objects'.

A related practical problem occurs with the 'closing of books', which is a certain formalized way of making a kind of backup. After one has accumulated a lot of financial data, one has to periodically make a permanent record of that data, and then weed through it to throw away old (no longer interesting) transactions. The 'closed book' contains all of the old transactions, while the 'open book' looks just like it (has the same accounts, etc), but doesn't have the old transactions.

The Issue

The current book-closing code makes a copy of the account tree, and sorts all transactions, by date, into the new or the old account tree. With the goal of not confusing the new and the old account trees, the book closing code issues the old accounts a new set of guids. The Pro's & Con's of this scheme:

Pro: The 'old', closed accounts can be uniquely accessed according to their GUID's, without causing confusion with similar/same.