2021-07-04 GnuCash IRC logs

00:03:57 *** celeste has quit IRC
00:05:47 *** Aussie_matt_ has joined #gnucash
00:06:09 *** Aussie_matt has quit IRC
00:15:20 *** Mechtilde has joined #gnucash
01:08:53 *** AstronautSurfer has quit IRC
01:11:20 *** fell has quit IRC
01:12:39 *** fell has joined #gnucash
01:12:39 *** ChanServ sets mode: +o fell
01:31:23 <chris> @tell gjanssens branch updated.
01:31:23 <gncbot> chris: The operation succeeded.
01:40:16 *** frakturfreak has quit IRC
01:54:55 *** frakturfreak has joined #gnucash
01:57:35 *** celeste has joined #gnucash
01:57:35 *** ChanServ sets mode: +v celeste
01:58:54 *** Aussie_matt_ has quit IRC
02:14:12 *** CDB-Man has joined #gnucash
02:14:12 *** ChanServ sets mode: +v CDB-Man
02:16:38 *** CDB-Man_ has quit IRC
02:46:13 *** sbluhm has joined #gnucash
02:52:10 *** Bambuzel has joined #gnucash
02:52:10 *** ChanServ sets mode: +v Bambuzel
02:59:18 *** gjanssens has joined #gnucash
02:59:18 *** ChanServ sets mode: +o gjanssens
03:03:28 *** sbluhm has quit IRC
03:25:10 *** Bambuzel has quit IRC
03:40:11 *** sbluhm has joined #gnucash
04:12:25 *** sbluhm has quit IRC
04:41:34 *** sbluhm has joined #gnucash
04:45:22 *** kcin has quit IRC
05:05:51 *** User has joined #gnucash
05:18:22 *** sbluhm has quit IRC
05:47:41 *** gjanssens has quit IRC
05:48:01 *** gjanssens has joined #gnucash
05:48:01 *** ChanServ sets mode: +o gjanssens
06:38:15 *** sbluhm has joined #gnucash
06:51:58 *** kcin has joined #gnucash
08:12:44 *** sbluhm has quit IRC
08:22:33 <chris> gjanssens: still not 100% satisfactory: (1) the Prepayment sign is wrong in Vendor, correcct in Customer. We could use abs() but I'd rather not.
08:23:36 <chris> (2) the LHS Invoice -> RHS Payment can report correctly when the payment txn has multiple non-APAR splits. the LHS payment -> RHS payment can't.
08:37:39 *** field^Zzz3 has joined #gnucash
09:37:12 <chris> (1) has been fixed. I think we're ready to create tests and merge in.
09:44:28 <Simon> https://bugs.gnucash.org/show_bug.cgi?id=797535 happens for me too on every execution; I think c_formats is being destroyed twice
09:50:11 <Simon> https://s85.org/DAyPGcka happens first in gnc_shutdown() -> exit(), and then again via _dl_fini()
09:50:36 <Simon> both the same object
09:54:24 <Simon> https://s85.org/FU0bUtf6 it's also being constructed twice, the second time being when opening libgncmod-app-utils.so
09:55:36 *** sbluhm has joined #gnucash
10:00:13 *** AstronautSurfer has joined #gnucash
10:00:13 *** ChanServ sets mode: +v AstronautSurfer
10:01:56 *** qwer has joined #gnucash
10:08:15 <Simon> I've created bug https://bugs.gnucash.org/show_bug.cgi?id=798229
10:37:04 <chris> jralls warlord would it be the long-term view that instead of xaccAccountGetSplitList (acc) we'd need to exec.sql("SELECT transactions FROM accounts WHERE account.name == 'Bank'") ???
10:42:27 *** jervin has joined #gnucash
10:50:18 *** kcin1 has joined #gnucash
10:52:31 *** kcin has quit IRC
10:52:46 *** kcin1 is now known as kcin
11:07:03 <warlord> chris, No, I think we would still use those accessor functions, but under the covers it would perform such a SELECT.
11:18:51 *** sbluhm has quit IRC
11:28:49 *** sbluhm has joined #gnucash
11:34:31 *** storyjesse has joined #gnucash
11:39:55 *** qwer has quit IRC
11:40:43 *** storyjesse has quit IRC
11:41:42 *** storyjesse has joined #gnucash
11:53:21 <jralls> Simon, looks like https://stackoverflow.com/questions/35229310/sigsegv-in-dl-fini. What's the provenance of your GnuCash instance?
11:57:56 <jralls> chris, warlord is correct, queries will be abstracted so that higher-level code doesn't depend on the database implementation or schema.
12:15:52 *** storyjesse has quit IRC
12:16:04 *** sbluhm has quit IRC
12:25:32 <Simon> jralls: I built it myself
12:26:22 <jralls> Do you have a distro-supplied GnuCash installed too?
12:26:36 <Simon> no
12:29:31 <jralls> OK. Then the problem is likely something to do with libgncmod-app-utils.so's linking libgnc-engine.so.
12:29:53 <jralls> Odd, though, that the crash isn't more common.
12:30:51 <Simon> you would only see it if you ran gnucash from a console
12:31:16 <Simon> crashes on exit don't really appear anywhere otherwise
12:35:27 *** qwer has joined #gnucash
12:40:05 <jralls> I run GnuCash from the command line rather often. In fact I run it in the debugger rather often.
12:42:22 <jralls> And doing so just now on F33 I don't get the second construction of the static std::vector<GncDateFormat>, just the one from glibc. Did you do anything with GnuCash or did you just start and quit?
12:43:28 *** sbluhm has joined #gnucash
12:43:35 <Simon> I just started it with --nofile and then quit
12:46:11 <Simon> I'm using Ubuntu 18.04
12:47:05 <Simon> that's the only object that glibc can identify a double free on at exit, if I make static copies of instead then nothing else goes wrong
12:47:52 <jralls> OK, I'll try there. Did you start from absolutely clean, meaning make/ninja uninstall && rm -rf * && cmake... && make/ninja && make/ninja install?
12:48:04 <jralls> And which are you using, make or ninja?
12:48:21 <Simon> I'm using make
12:48:41 <Simon> I did a make clean but I haven't removed the installed files
12:53:54 <jralls> Did you run the debugger against the installed GnuCash or the one in the build directory?
12:55:18 <Simon> the installed gnucash
12:56:09 <jralls> OK. I've switched VMs to Ubunutu 18.04, building now.
12:57:08 <Simon> deleting everything and rebuilding has fixed it :/
12:59:27 <jralls> Interesting. The only surprising thing about that is that loading libgncmod-app-utils.so did a second construction/destruction even though libgnc-engine.so was replaced.
13:02:43 <jralls> I'm inclined to blame the GCC linker or link editor, but maybe there's some cmake incantation that we're missing.
13:03:00 <Simon> I have way too many .so files in the broken version
13:03:30 <jralls> Oh really? Where are they and what do the duplicates look like?
13:03:53 <Simon> https://s85.org/PoAnNXzB
13:04:36 <Simon> I've got lib/gnucash/libgncmod-engine.so and lib/libgnc-engine.so
13:04:51 <Simon> did they move between 3.8b and 4.6?
13:05:36 <Simon> I've got gnucash-7633-798229/lib/gnucash/libgncmod-engine.so from a week ago, which will be from 3.8b
13:07:46 <jralls> Ahah! Yes, we changed from dlopening all of the core libraries to linking them directly into the executable so libgncmod-engine.so in GC3 became libgnc-engine.so in GC4.
13:09:10 <jralls> Same for all of those other "extra" libgncmod*.so in your list.
13:09:33 <Simon> so it's still loading all of the old modules
13:10:29 <jralls> Right. The startup code loads everything it finds in <libdir>/gnucash.
13:12:43 <jralls> So you had the same static defined in libgnc-engine.so and libgncmod-engine.so leading to a double free. Also lots of violations of the one definition rule and resulting undefined behavior.
13:19:03 <jralls> I wonder if there's a good way to detect that those old modules have gotten loaded and quit with an error message.
13:20:58 <Simon> you could have a list of disallowed names
13:21:37 <Simon> or detect when functions are redefined
13:25:44 <jralls> I don't think dlopen knows how to do the latter, but gnc_module has some introspection built into it so I think we can test for the module initialization functions.
13:26:44 * Simon waits for callgrind to finish
13:27:07 <Simon> is it a known problem that xml files take 4x longer to load?
13:31:03 <Simon> (compared to 2.6)
13:33:04 <jralls> There was a long running bug about files taking a long time to load on Windows in particular that was caused by a combination of a stupid Windows redraw strategy and GnuCash updating the progress bar too often. That caused the whole window to get redrawn several thousand times per second. I cut the number of updates by 1/1000 and got the load times down to a couple of seconds even for very large files. I think that went into 4.5.
13:35:15 <jralls> There hasn't been any big change to the XML backend except changing compilers from C to C++. The big difference is that GC3 uses Gtk3 which changed the drawing code a bit, but nothing like what Gtk4 does.
13:37:47 <jralls> Do you see the problem now that you've got the module problem sorted and if so is profiling in your skill set?
13:38:36 *** AstronautSurfer has quit IRC
13:38:44 <Simon> I still have the same loading time after fixing the modules
13:39:02 <Simon> the last time I profiled this on 2.x I took out the utf8 verification and to make it a bit faster
13:39:19 <Simon> but 4x slower implies that something else is significantly slowing it down
13:39:41 <Simon> there is a point in the loading process where GnuCash opens all of the tabs that has no progress indicator (it just sits at 0%)
13:41:43 <Simon> there are bits of the code that haven't changed since 2.x that I think could be made faster... finding tags by strcmp loop isn't optimal
13:41:47 <jralls> Loading the file happens with the splash screen showing. If it's slow loading a tab then it's that tab's process that's slow. Is it a report tab?
13:42:04 <Simon> the issue is not so much that any one tab is slow but that it shows no progress
13:42:16 *** chris_ has joined #gnucash
13:42:17 *** gncbot sets mode: +o chris_
13:42:31 *** chris has quit IRC
13:42:38 <Simon> on 2.x I could get it to show titles in the splash screen if I reopened the file while the app was running, but that doesn't happen in 4.6
13:43:39 <Simon> what do you consider to be a "very large" file? mine is 59MB and has 23776 transactions
13:43:51 <jralls> Finding anything by strcmp loop isn't optimal. Neither is using strings where one should use enums. PRs most welcome for either, but beware that it's easier said than done thanks to spaghetti ala GnuCash.
13:44:30 <jralls> That's about half of what mine has. Business users have much bigger ones.
13:44:31 <Simon> the biggest problem is not changing that code to be faster, it's doing it in a readable way
13:45:31 <Simon> because to be fast the code needs to check individual characters and be hard-coded for specific strings
13:45:54 *** celeste has quit IRC
13:46:02 <Simon> changing the data structure is likely to be slower when there are so few options
13:46:32 <Simon> https://github.com/smilingthax/cttrie might work
13:46:42 <Simon> but I think the main performance problem will be somewhere else
13:47:10 <jralls> Eww.
13:47:13 <Simon> and of course it really needs to try and do as much parsing as possible in multiple threads
13:47:37 <Simon> I don't see a better option - implementing a trie for 10 items will probably be slower
13:47:51 <jralls> Multiple threads is a huge problem for GnuCash. It's littered with statics.
13:47:57 <Simon> strcmp is a waste of time but it's still fast
13:48:47 *** jervin has quit IRC
13:49:24 <jralls> Right. The fix is to do ptr or int compares instead of strcmp. GnuCash has a string cache that can be used for ptr compares and GLib provides quarks to convert strings to ints.
13:51:01 <jralls> It's a lot of tedious work to track down every usage of a static string and convert it, so nobody's been willing to do it.
13:51:46 <jralls> I guess it's also worth noting that those strcmp loops don't show up in profiling so there's a strong argument that effort is better directed elsewhere.
13:52:31 <Simon> well the input is XML so it has to check a string somewhere?
13:52:59 <Simon> I looked at quarks but the code to set it up is currently static so that would have to be moved
13:54:18 <jralls> Parsers are another matter entirely. Yes, parsing a text file requires lots of strcmps. There's no way around that.
13:54:58 <jralls> IMO that's best left to specialized libraries like libxml2 or boost::asio.
13:55:13 <Simon> not if the code is doing switch (text[0]) { switch (text[1]) { ...
13:55:35 <Simon> if there are 20 possibly tag names there's no point doing 19 strcmps to find the last one
13:55:39 <Simon> possible*
13:58:35 <jralls> That's incredibly ugly, incredibly brittle, and very unlikely to be any faster.
14:00:50 <jralls> Hash the tags and do a switch() on the hash.
14:01:14 <Simon> that makes an assumption that the hashes always end up being unique
14:01:38 <Simon> if the compiler generated it then it wouldn't be so ugly, hence cttrie
14:01:43 <Simon> but the problem is likely to be somewhere else
14:02:19 <Simon> I don't understand why it takes so long for my file to load unless something changed in how the price db works?
14:02:25 <jralls> If there are only 20 tags then hash collisions are very unlikely.
14:02:31 <Simon> it's much faster in the UI now, but I have 42305 prices
14:03:39 *** kcin has quit IRC
14:04:09 <jralls> The pricedb is a nested hash table but at the bottom it's a linked list. If you have 42K prices in 4K commodity pairs then it will be OK. If its 4 commodity pairs the 10K-long lists will kill performance.
14:04:17 *** kcin has joined #gnucash
14:06:27 <Simon> there are prices for 4 currencies and then prices for ~20 things in one of 2 currencies (most of those in one currency)
14:07:17 <jralls> so 10K-long lists are possible.
14:09:30 *** celeste has joined #gnucash
14:09:30 *** ChanServ sets mode: +v celeste
14:11:02 <jralls> There's a remove-old button that will delete all prices older than a certain date. Unfortunately that's probably not what you want; something that would keep only weekly/monthly/quarterly prices before a certain date would be more useful.
14:12:26 <Simon> if I remove all the prices the load time halves
14:12:52 <Simon> most of the time the huge pricedb hasn't been a problem (in 2.6) except if I want to use the price editor
14:13:08 *** Mechtilde has quit IRC
14:13:15 <Simon> in 4.6 the price editor is usable again
14:14:16 *** Mechtilde has joined #gnucash
14:19:05 <jralls> OBTW there are several scrubs on 2.6 data for 3 and 4, including IIRC a pricedb one. They'll take a while the first time you open a 2.6 book but once done and saved it will be fast again. If you have import account matching history there's another one that runs the first time you import into an account that also takes a while.
14:20:26 <Simon> the file has been saved numerous times since upgrading
14:20:45 *** chris_ has quit IRC
14:21:08 *** chris_ has joined #gnucash
14:21:09 *** gncbot sets mode: +o chris_
14:58:14 *** qwer has quit IRC
14:58:19 *** field^Zzz3 has quit IRC
15:11:43 *** qwer has joined #gnucash
15:12:42 *** qwer has joined #gnucash
15:37:30 *** sbluhm has quit IRC
15:41:49 *** qwer has quit IRC
15:43:38 *** Mechtilde has quit IRC
15:57:14 *** sbluhm has joined #gnucash
16:01:17 *** jralls has quit IRC
16:01:54 *** jralls has joined #gnucash
16:01:55 *** ChanServ sets mode: +o jralls
16:05:24 *** kcin1 has joined #gnucash
16:05:48 *** kcin has quit IRC
16:07:52 *** kcin1 is now known as kcin
16:32:57 *** kramer has quit IRC
16:36:54 *** kramer has joined #gnucash
16:36:54 *** ChanServ sets mode: +v kramer
16:39:25 *** David has quit IRC
16:39:43 *** David has joined #gnucash
16:44:40 <jralls> .
17:07:56 *** kcin has quit IRC
17:09:12 *** jervin has joined #gnucash
17:11:14 *** gjanssens has quit IRC
17:16:19 *** jervin has quit IRC
17:35:31 <Simon> I can reduce the load time by 20% just by commenting out the debug log messages in sixtp_sax_end_handler and dom_tree_to_text
17:36:10 <Simon> there are over 2 million calls to g_log/g_logv/log4glib_handler
17:36:26 <Simon> 4 million calls to qof_log_check
17:36:45 <Simon> (I realise that one is transaction logging, not event logging)
17:37:15 <Simon> oh apparently it's not
17:39:08 <Simon> there is far far too much effort required to find out what the log level is
17:39:28 <Simon> it's calling split_domain() in qof_log_check() over 4 million times
17:40:39 <Simon> which means it's not going through the "if (level < module->m_level)" fast path
17:40:43 *** sbluhm has quit IRC
17:40:49 <Simon> also... split_domain() is really expensive
17:43:36 <Simon> add_price() is calling gnc_pricedb_lookup_day_t64() for every price
17:45:27 <Simon> that's new since 2.6
17:48:06 <Simon> I think that could be optimised for adding a price that's newer than all known prices
17:48:37 <Simon> because the file *should* be in time order, so if the prices are in order and it's newer than the most recent price then there's no need to do a more expensive lookup
17:49:17 <Simon> if I take that check out I'm down to 15 seconds from 28 seconds originally
17:50:06 <Simon> 6s saved by removing the most frequent debug logging, and 7s saved by not doing the price overwrite check in add_price()
17:50:39 <Simon> I'll have a look at this again tomorrow
17:58:01 <Simon> there are nearly 57m calls to the basic_string constructor and nearly 40m of them are from qof_log_check()
18:00:26 <jralls> Simon, all interesting observations. Are they from profile output?
18:03:51 <Simon> yes
18:03:59 <jralls> Simon, we might be able to get rid of the std::string construction in qof_log_check by changing split_domain to take a std::stringview instead of std::string.
18:04:10 <Simon> price_list_from_hashtable is making a g_list_copy 140k times
18:04:43 <Simon> I think the boost day of week stuff may be using up a lot of time too but I need to run this again with the above stuff omitted
18:05:04 <Simon> it took 30 minutes to run earlier so I won't do it again today
18:05:57 <Simon> https://wizzard.uuid.uk/tmp/callgrind.out.2021-07-04.23-05-14.xz
18:06:31 <jralls> Oh, callgrind. No wonder it takes a long time.
18:06:38 <Simon> the biggest problem is that qof_log_check shouldn't even be called
18:07:00 <Simon> I think the debug logging this frequently in the file load process may need to be compiled out
18:07:28 <Simon> or do a check for the log level once before starting and store that for use by those functions
18:08:16 <Simon> but qof_log_check could do with having an output log level as well as a default log level
18:08:50 <jralls> We need users to be able to crank up the logging to troubleshoot problems, so it has to stay. But I agree that parsing the logging domain on every call is dumb.
18:08:54 <Simon> so that if debug logging is disabled globally there's no need to check for modules that have it enabled
18:09:18 <Simon> yes but if you had an output log level that forced it to avoid checking, it could run a lot faster
18:12:31 *** jervin has joined #gnucash
18:15:04 <jralls> I think you mean that we could set a flag at startup that indicating that there was no extra logging requested and included that check in the start of qof_log_check.
18:19:36 <jralls> The next level improvement would be to not use std::string in split_domain. Even better would be to replace the domain strings with hashes.
18:24:10 <jralls> On prices, yse, boost::date_time is slow and awkward. gnc_pricedb_lookup_day_t64 could be bypassed when loading from the database, if it's in the database we already know that there's only one per day unless it's a very old database.
18:25:43 <jralls> price_list_from_hashtable is supposed to make a g_list_copy. The issue is what's calling it 140K times?
18:31:45 *** jervin1 has joined #gnucash
18:31:56 *** jervin has quit IRC
18:31:56 *** jervin1 is now known as jervin
18:37:01 <jralls> Simon it's also worth keeping in mind that number of times called is not the same, nor even really equivalent, to amount of time consumed. 10K calls to a function that allocates is nearly always worse than 100K calls to one that doesn't. Iterating through a linked list is 3x or more worse than doing so over a std::vector or std::array.
18:39:08 *** jervin1 has joined #gnucash
18:39:28 *** jervin has quit IRC
18:39:28 *** jervin1 is now known as jervin
18:43:38 *** User has quit IRC
19:22:55 *** Aussie_matt has joined #gnucash
19:26:50 *** jervin has quit IRC
19:33:11 *** Pegasus_RPG has joined #gnucash
20:04:52 <chris_> so I guess we'd be able to use accessors *and* call sqlite statements?
20:04:55 *** chris_ is now known as chris
20:29:41 *** waitman| has quit IRC
20:43:46 *** jervin has joined #gnucash
21:03:07 *** jervin has quit IRC
22:19:07 *** root_ has joined #gnucash
22:31:16 *** jervin has joined #gnucash
22:34:40 *** waitman- has joined #gnucash
22:37:05 *** jervin has quit IRC
22:39:18 *** root_ has quit IRC
23:23:04 *** jervin has joined #gnucash
23:28:02 *** jervin has quit IRC
23:31:39 *** Pegasus_RPG has quit IRC
23:31:41 *** Pegasus_RPG has joined #gnucash
23:32:59 *** Pegasus_RPG has quit IRC
23:36:45 *** celeste has quit IRC
23:36:51 *** celeste has joined #gnucash
23:36:51 *** ChanServ sets mode: +v celeste