|
|
(10 intermediate revisions by 5 users not shown) |
Line 1: |
Line 1: |
− | This article collects some notes about the XML file format of [[GnuCash]]. So far it is just descriptive, and neither normative nor authoritative. | + | [[Category:XML]] |
| + | This article collects some notes about the XML file format of GnuCash. It is descriptive, and neither normative nor authoritative. |
| | | |
− | Beginning with version 1.6, the primary GnuCash storage mechanism is an [[Wikipedia:XML|XML]] file. The file is optionally compressed with [[Wikipedia:gzip|gzip]] (“<u>E</u>dit” menu → “Preferences” → “General” → “Use file compression”). | + | Beginning with version 1.6, the primary GnuCash storage mechanism is an XML file. The file is optionally compressed with gzip, which is a preference that is set at <u>E</u>dit→Preferences→General→Use file compression. |
| | | |
− | There is a non-normative [[Wikipedia:RELAX NG|RELAX NG]] schema for the 1.8/2.0 XML file format ([http://svn.gnucash.org/trac/browser/gnucash/trunk/src/doc/xml/gnucash-v2.rnc src/doc/xml/gnucash-v2.rnc]). There were also DTD schema definitions, but these are outdated and do not define the current format correctly ([http://svn.gnucash.org/trac/browser/gnucash/branches/1.8/src/doc/xml src/doc/xml (1.8)]).
| + | [{{URL:git}}gnucash/blob/stable/libgnucash/backend/xml/DTD/gnucash-v2.rnc gnucash-v2.rnc] is a non-normative RELAX NG schema for the XML file format. There are also DTD schema definitions in [{{URL:git}}gnucash/tree/stable/libgnucash/backend/xml/DTD libgnucash/doc/xml], but these are outdated and do not define the current format correctly. |
| | | |
− | Please keep in mind that GnuCash series 1.8.x uses the libxml1 library for XML access, whereas 1.9.0 and later uses the libxml2 library. Some behaviour regarding XML files is therefore quite different in 1.8.x compared to 1.9.x/2.0.0.
| + | Many elements in the XML file are identified by Globally Unique Identifiers (GUID). GnuCash includes its own GUID implementation. |
− | | |
− | XML files written by all GnuCash 1.8.x versions are missing [http://www.w3.org/TR/REC-xml-names/#ns-decl XML namespace declarations] that are required by most XML processing software (see also [[FAQ#Q: How can I export data?]]). See [http://www.gnucash.org/docs/v1.8/C/gnucash-guide/appendixa_xmlconvert1.html GnuCash Tutorial and Concepts Guide, Appendix A, part 5: Converting XML GnuCash File] for the missing declarations. From version 1.8.5 onwards GnuCash is able to ''read'' XML files containing these declarations ([http://mail.gnome.org/archives/gnome-announce-list/2003-August/msg00070.html 1.8.5 release notes]). From 1.9.0 onwards GnuCash will write the required namespace declarations as well.
| |
− | | |
− | Many elements in the XML file are identified by [[Wikipedia:Globally Unique Identifier|GUID]]. GnuCash includes its own GUID implementation ([http://svn.gnucash.org/trac/browser/gnucash/branches/1.8/src/engine/guid.h guid.h], [http://svn.gnucash.org/trac/browser/gnucash/branches/1.8/src/engine/guid.c guid.c] (1.8); | |
− | [http://svn.gnucash.org/trac/browser/gnucash/branches/2.2/lib/libqof/qof/guid.h guid.h],
| |
− | [http://svn.gnucash.org/trac/browser/gnucash/branches/2.2/lib/libqof/qof/guid.c guid.c] (2.2)).
| |
| | | |
| ==Character encoding== | | ==Character encoding== |
− | GnuCash 1.8.x interprets XML documents using a character encoding determined by operating-system–level locale settings, and so does not include an [http://www.w3.org/TR/REC-xml/#NT-EncodingDecl encoding declaration] in the opening [http://www.w3.org/TR/REC-xml/#sec-TextDecl XML text declaration]. (The locale setting here constitues a “higher-level protocol” in W3C vernacular [http://www.w3.org/TR/REC-xml/#charencoding].) GnuCash serializes non-[[Wikipedia:ASCII|ASCII]] octets (i.e. those with the high-order bit set) as decimal numeric character references. (E.g., an em-dash is represented as “<code>&#8212;</code>”.)
| |
− |
| |
− | On the other hand, GnuCash 1.9.0 and later writes the XML document always in UTF-8 encoding and also includes the appropriate encoding declaration in the opening XML text declaration. (I think the serialization is still done as decimal numeric character references but this has to be checked.)
| |
| | | |
− | For example, in 1.8.x the UTF-8 encoding of the Cyrillic capital letter “Б” is written as “<code>&#208;&#145;</code>”. As the following Python script shows, the UTF-8 text should be transcoded to recover the original Unicode text. (This script uses the [http://4suite.org/ 4Suite] XML library.)
| + | With GnuCash 1.9.0, GnuCash writes the XML document using UTF-8 encoding and includes the appropriate encoding declaration in the opening XML text declaration. |
− | | |
− | <pre>#! /usr/bin/python2.4
| |
− | | |
− | from Ft.Xml.Domlette import NonvalidatingReader
| |
− | from Ft.Xml.XPath import Evaluate
| |
− | from Ft.Xml.XPath.Context import Context
| |
− | | |
− | # precondition: foo.xac was created by GnuCash with LANG=en_US.UTF-8
| |
− | doc = NonvalidatingReader.parseUri('file:///tmp/foo.xac')
| |
− | context = Context(doc, processorNss={'cd' : "http://www.gnucash.org/XML/cd",
| |
− | 'book' : "http://www.gnucash.org/XML/book",
| |
− | 'gnc' : "http://www.gnucash.org/XML/gnc",
| |
− | 'cmdty' : "http://www.gnucash.org/XML/cmdty",
| |
− | 'trn' : "http://www.gnucash.org/XML/trn",
| |
− | 'split' : "http://www.gnucash.org/XML/split",
| |
− | 'act' : "http://www.gnucash.org/XML/act",
| |
− | 'price' : "http://www.gnucash.org/XML/price",
| |
− | 'ts' : "http://www.gnucash.org/XML/ts",
| |
− | 'slot' : "http://www.gnucash.org/XML/kvpslot",
| |
− | 'cust' : "http://www.gnucash.org/XML/cust",
| |
− | 'addr' : "http://www.gnucash.org/XML/custaddr"})
| |
− | | |
− | accountName = Evaluate('/gnc-v2/gnc:book/gnc:account[act:id="0d69c3557f4d9340198bfd151f9e13cb"]/act:name/text()',
| |
− | context=context)[0]
| |
− | | |
− | # object of type "str" (is actually UTF-8–encoded, not latin1!):
| |
− | name_raw = accountName.data.encode('latin1')
| |
− | | |
− | # object of type "unicode":
| |
− | name_unicode = name_raw.decode('utf-8')
| |
− | | |
− | # objects of type "str":
| |
− | name_koi8r = name_unicode.encode('koi8-r')
| |
− | name_utf8 = name_unicode.encode('utf-8')
| |
− | name_utf16 = name_unicode.encode('utf-16')
| |
− | | |
− | assert name_utf8 == accountName.data.encode('latin1')</pre>
| |
| | | |
| ==Validation== | | ==Validation== |
| | | |
− | The RELAX NG schema file mentioned above can be used to validate your GnuCash file. What you need to do so: | + | The RELAX NG schema file mentioned above can be used to validate an uncompressed GnuCash XML data file. This requires that you: |
− | * your GnuCash data file in uncompressed format | + | * save your GnuCash data file in uncompressed format |
− | * a validator. [http://code.google.com/p/jing-trang/ Jing] will be used in this example. | + | * use an XML validator--e.g., [{{URL:GH}}relaxng/jing-trang Jing], which will be used in this example. |
| | | |
− | As stated above, the GnuCash data file can be gzip compressed. Jing will not work on such a file. So you first have to get your data file in an uncompressed state. The easiest way to do this is to load (a copy of) your data file in GnuCash, edit the preference to '''not'' use gzip compression and save your file. (Remember to reset the preference afterwards). | + | As stated above, the GnuCash data file is by default stored using gzip compression. You must first save your data file in an uncompressed state. The easiest way to do this is to change the storage preference and save your file. (Remember to reset the preference afterwards). |
| | | |
| Then download jing and run the following command | | Then download jing and run the following command |
− | jing -c path-to-gnucash-v2.rnc path-to-your-datafile.gnucash
| + | <Syntaxhighlight lang="sh"> |
− | | + | jing -c path-to-gnucash-v2.rnc path-to-your-datafile.gnucash |
− | jing will report any validation errors it finds in a clear manner. | + | </Syntaxhighlight> |
− | | + | jing will report any validation errors it finds. |
− | ;Note:The validation should not be considered authoritative as the schema is not updated and tested very often. So validation errors can just as easily be due to errors in the schema than due to errors in the data file.
| |
| | | |
− | ''Based on information provided by Baptiste Carvello in [https://bugzilla.gnome.org/show_bug.cgi?id=680887 bug 680887].''
| + | ;Note:The validation should not be considered authoritative, as the schema is not updated or tested very often. So validation errors can just as easily be due to errors in the schema than due to errors in the data file. |
| | | |
− | ==See also== | + | ''Based on information provided by Baptiste Carvello in [{{URL:Bugs}}show_bug.cgi?id=680887 bug 680887].'' |
− | * [[List of external software interfaces]]
| |
− | * [[Development]]
| |
− | ** [[Building]]
| |
− | ** [[Subversion]]
| |
− | ** [[Bugzilla]]
| |
| | | |
| ==External links== | | ==External links== |
− | * http://qof.sourceforge.net/ - QOF is the object persistence layer used by GnuCash | + | * https://gnucashtoqif.us/ - GnuCash XML → [{{URL:wp}}QIF QIF] conversion tool |
− | ** [[User:Jsled]] That's slightly misleading in the context of this page; for instance, when writing out the data to the current XML format, QOF isn't used at all.
| + | *: [https://gnucashtoqif.us/#mozTocId164261 Notes about file format] |
− | * http://gnucashtoqif.sourceforge.net/ - GnuCash XML → [[Wikipedia:QIF|QIF]] conversion tool
| + | * [{{URL:WA}}20070219085556/http://edseek.com/archives/2005/08/18/gnucash-export-to-gnumeric-and-csv/ GnuCash export to Gnumeric and CSV], using [{{URL:wp}}XSLT XSLT] -- '''Note:''' With '''GnuCash 3.2''', users can export to CSV directly from the program. |
− | ** [http://gnucashtoqif.sourceforge.net/#mozTocId164261 notes about file format] | |
− | * [http://edseek.com/archives/2005/08/18/gnucash-export-to-gnumeric-and-csv/ GnuCash export to Gnumeric and CSV], using [[Wikipedia:XSL Transformations|XSLT]] | |
− | * Relevant mailing list threads
| |
− | ** [http://lists.gnucash.org/pipermail/gnucash-devel/2002-March/thread.html#5750 March 2002]
| |
− | * [http://bugzilla.gnome.org/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=&product=GnuCash&component=XML+Backend&long_desc_type=allwordssubstr&long_desc=&status_whiteboard_type=allwordssubstr&status_whiteboard=&keywords_type=allwords&keywords=&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=NEEDINFO&bug_status=VERIFIED&emailtype1=substring&email1=&emailtype2=substring&email2=&bugidtype=include&bug_id=&chfieldfrom=&chfieldto=Now&chfieldvalue=&cmdtype=doit&order=Reuse+same+sort+as+last+time&field0-0-0=noop&type0-0-0=noop&value0-0-0= non-closed, non-resolved GnuCash bug reports pertaining to the XML backend]
| |
This article collects some notes about the XML file format of GnuCash. It is descriptive, and neither normative nor authoritative.
Beginning with version 1.6, the primary GnuCash storage mechanism is an XML file. The file is optionally compressed with gzip, which is a preference that is set at Edit→Preferences→General→Use file compression.
Many elements in the XML file are identified by Globally Unique Identifiers (GUID). GnuCash includes its own GUID implementation.
With GnuCash 1.9.0, GnuCash writes the XML document using UTF-8 encoding and includes the appropriate encoding declaration in the opening XML text declaration.
The RELAX NG schema file mentioned above can be used to validate an uncompressed GnuCash XML data file. This requires that you:
As stated above, the GnuCash data file is by default stored using gzip compression. You must first save your data file in an uncompressed state. The easiest way to do this is to change the storage preference and save your file. (Remember to reset the preference afterwards).
jing will report any validation errors it finds.