Difference between revisions of "Testing"

From GnuCash
Jump to: navigation, search
(Rewrite Travis CI to reflect that we use Github Actions.)
m (Coverage: Fix typo)
 
(4 intermediate revisions by the same user not shown)
Line 19: Line 19:
 
For C++ code GnuCash has adopted the [https://code.google.com/p/googletest/ Google Test Framework].
 
For C++ code GnuCash has adopted the [https://code.google.com/p/googletest/ Google Test Framework].
  
See also https://developer.gnome.org/glib/stable/gtester.html for the actual command line tool to run tests.
+
Guile provides full support for [https://srfi.schemers.org/srfi-64/srfi-64.html SRFI-64] and there are numerous GnuCash utilities available to speed test coding.
  
 
==== Running Tests ====
 
==== Running Tests ====
Line 35: Line 35:
 
   bin/test-gnc-timezone
 
   bin/test-gnc-timezone
 
while those that don't can be run using <tt>ctest -R</tt>.
 
while those that don't can be run using <tt>ctest -R</tt>.
 +
 +
=== CMake Asan Target, Coverage Options ===
 +
GnuCash's CMakeLists.txt defines a special target, Asan, and two options that complement testing. There are GitHub workflows that run jobs with these targets after every push.
 +
 +
==== Address Sanitizer ====
 +
Google developed and contributed to the world the concept of compiler sanitizers, one of the more useful of which is [https://github.com/google/sanitizers/wiki/AddressSanitizer Address Sanitizer]. It adds code to the executable that very efficiently tracks memory, both stack and heap, allocations, deallocations, and references. When it detects a misuse it crashes immediately, reporting the current stack trace, the stack trace from which the memory was allocated, and (if it's a use-after-free) the stack trace in which it was freed. This makes it vastly quicker to fix this kind of bug. On non-Apple Unixes it also will report leaks and One Definition Rule violations. We have rather a lot of both in our tests so CMakeLists provides options <tt>LEAKS</tt> and <tt>ODR</tt> to enable them. They are off by default and must be enabled at configure time. For example
 +
    cmake -G Ninja -DCMAKE_BUILD_TYPE=Asan -DLEAKS=ON
 +
will do an Asan build with leak reporting enabled and ODR violation reporting disabled.
 +
 +
 +
A CI workflow (see below) runs an Asan build with both leak and ODR violation detection disabled.
 +
 +
==== Coverage ====
 +
Another flavor of instrumentation that compilers provide is code coverage. When this is enabled the compiler inserts code that monitors each line of source code, each function, and (though not all compilers support it well and it's a bit tricky in some contexts) branches. The the counters in place and zeroed one can run the test suite and then collect from the counters how many times each line of code was run. There are tools available that can digest the counter data into dashboards displaying how much of each directory was exercised by the tests. One can then drill down through the directory structure to individual source files to see which lines aren't tested, providing guidance about where the tests need improvement.
 +
 +
The Guile programming environment provides a similar capability for Scheme code.
 +
 +
LCOV is a facility written by the Linux Kernel team to visualize test coverage. It's basic but it's also ubiquitous on Linux. It has two programs, geninfo and genhtml, that do the actual work; lcov is a command wrapper. There is a fourth program, c++demangle, that translates C++ "mangled" function names into the names in the source code. CMake requires that they be present; if any are missing it will disable coverage instrumentation. They're generally all provided by a single lcov package that you can install with apt, dnf, pacman, etc.
 +
 +
CMakeLists.txt provides two options, <tt>COVERAGE</tt> and <tt>GUILE_COVERAGE</tt>, and associated targets <tt>lcov-initialize</tt>, <tt>lcov-collect</tt>, and <tt>lcov-generate-html</tt>.
 +
 +
<tt>COVERAGE</tt> and <tt>GUILE_COVERAGE</tt> are independent: You can enable either or both, but beware: The penalty for <tt>COVERAGE</tt> is only a few seconds but <tt>GUILE_COVERAGE</tt> will slow the test suite to well over 10 minutes or longer on a slow machine with few cores. The basic procedure after configuring with <tt>COVERAGE</tt> or <tt>GUILE_COVERAGE</tt> enabled and building GnuCash is to zero the counters, do something, collect the counters, and make a website. For example
 +
    ninja lcov-initialize
 +
    ninja check
 +
    ninja lcov-collect
 +
    ninja lcov-generate-html
 +
is how the Github workflow produces [http://gnucash.github.io/gnucash/Coverage-HTML/ the GnuCash coverage pages]. Note that you don't necessarily have to run <tt>ninja check</tt>. If you're working on a single test program you can save considerable time, especially if it's Scheme, by running only that one program: Not only will you not have to wait for everything else that you're not interested in to run, you'll get a smaller, easier to navigate website showing the effects only of your work.
  
 
=== Continuous Integration: Github Actions ===
 
=== Continuous Integration: Github Actions ===
We use Github Actions to run the full test suite after every push and for every Github pull request. The Actions are generated by YAML files in <code>.github/workflows</code> in the GnuCash root directory. Individuals with Github forks can easily run the actions on their forks by enabling actions and setting the requisite branches. We run tests on the default current Ubuntu LTS and old-LTS, on an Arch Linux Docker, and on macOS.
+
We use Github Actions to run the full test suite after every push and for every Github pull request. The Actions are generated by YAML files in <code>.github/workflows</code> in the GnuCash root directory. Individuals with Github forks can easily run the actions on their forks by enabling actions and setting the requisite branches. We run tests on the default current Ubuntu LTS and old-LTS, on an Arch Linux Docker, and on macOS for every push and every pull request to quickly catch breakage. We also run an Asan (see [[#Address Sanitizer]]) build and run the test suite to ensure that no memory violations have been added and a [[#Coverage]] build to generate a set of web pages assessing our test code coverage. The results of that run are viewable at https://gnucash.github.io/gnucash/Coverage-HTML/. As of this writing they're far from satisfactory.
  
 
Github generally notifies pushers by email when actions fail; this is in part controlled by your notification settings, but it's not perfectly reliable. If your commit has a red X next to the sha then it failed one or more tests and you need to look into it. You can also access the complete history of actions on the Actions tab.
 
Github generally notifies pushers by email when actions fail; this is in part controlled by your notification settings, but it's not perfectly reliable. If your commit has a red X next to the sha then it failed one or more tests and you need to look into it. You can also access the complete history of actions on the Actions tab.
Line 134: Line 161:
  
 
= Test Coverage Status =
 
= Test Coverage Status =
== Unit Tests ==
+
See http://gnucash.github.io/gnucash/Coverage-HTML/ for the current status. To dig in navigate to a file you're interested in and click it. The file's source code is displayed with three columns on the right: Line number, Branch Data, and Line data. There's a legend at the top, but it fails to mention that the number in Line data is the number of times that line was called during test suite execution.
=== LibQOF ===
 
Done:
 
* qofbackend
 
* qofbook
 
* qofinstance
 
* qofobject
 
* qofsession
 
* gnc-date
 
* kvp_frame
 
  
=== Engine ===
+
The per-push runs don't include Scheme code coverage because generating it is so slow. You can generate a local page with that information, see [#Coverage].
Done:
 
* Account
 
* Split
 
* Transaction
 
  
 
= Known Test Needs =
 
= Known Test Needs =

Latest revision as of 00:59, 5 May 2024

Languages English עִברִית

A good set of tests is a critical requirement for modern software development, both to ensure the quality of the product and to help developers to quickly modify code without introducing bugs or causing regressions. The literature on software testing is vast; excellent tutorials and references are available both online and in print, see Literature Survey.

All developers are encouraged to review tests and ensure that any work is covered by both acceptance and unit tests.

Current Test Architecture

Acceptance Tests

Acceptance tests in Gnucash are based on a home-grown (or anonymously sourced) set of macros and functions which can be found in src/test-core and src/engine/test-core. The quality, scope, and coverage of these tests varies greatly; some parts of GnuCash are tested lightly or not at all while others are tested fairly extensively. Although I've labelled them "acceptance tests", in many cases they're written more as unit tests whose scope is a single function rather than a whole module.

Unit Tests

Unit testing, invented by Kent Beck in the early 1990s, seeks to test the public interface of classes as thoroughly as possible: All member functions should be tested with as much variation of their parameters as possible, with an emphasis on corner cases. Tests should avoid dependence on the implementation to avoid brittleness.

GnuCash has adopted the GLib testing framework to facilitate unit testing of GObject-based classes and GLib-dependent code. Muslim Choclov wrote unit tests for the most important modules in LibQOF as a GSoC2011 project. Work continues to get all of LibQOF and the engine fully tested to facilitate major architectural changes needed to make GnuCash a proper database application.

For C++ code GnuCash has adopted the Google Test Framework.

Guile provides full support for SRFI-64 and there are numerous GnuCash utilities available to speed test coding.

Running Tests

 make check

or

 ninja check

builds and runs the full test suite. N.B. Some tests depend on additional libraries and the cmake built-in target test does not build them. Once the check target has been built check and test are synonyms.

These targets run the tests using the ctest command. Its default is to run all tests and to write the output to CMAKE_BUILD_DIRECTORY/Testing/Temporary/LastTest.log. Both can be overridden with command-line options, e.g.

 ctest -V -R test-engine.*

will run all of the tests whose target names begin with test-engine and will output the results to stderr. Individual tests can be built by naming their targets, e.g.

 ninja test-gnc-timezone

and those written in C which produce an executable can be run directly:

 bin/test-gnc-timezone

while those that don't can be run using ctest -R.

CMake Asan Target, Coverage Options

GnuCash's CMakeLists.txt defines a special target, Asan, and two options that complement testing. There are GitHub workflows that run jobs with these targets after every push.

Address Sanitizer

Google developed and contributed to the world the concept of compiler sanitizers, one of the more useful of which is Address Sanitizer. It adds code to the executable that very efficiently tracks memory, both stack and heap, allocations, deallocations, and references. When it detects a misuse it crashes immediately, reporting the current stack trace, the stack trace from which the memory was allocated, and (if it's a use-after-free) the stack trace in which it was freed. This makes it vastly quicker to fix this kind of bug. On non-Apple Unixes it also will report leaks and One Definition Rule violations. We have rather a lot of both in our tests so CMakeLists provides options LEAKS and ODR to enable them. They are off by default and must be enabled at configure time. For example

   cmake -G Ninja -DCMAKE_BUILD_TYPE=Asan -DLEAKS=ON

will do an Asan build with leak reporting enabled and ODR violation reporting disabled.


A CI workflow (see below) runs an Asan build with both leak and ODR violation detection disabled.

Coverage

Another flavor of instrumentation that compilers provide is code coverage. When this is enabled the compiler inserts code that monitors each line of source code, each function, and (though not all compilers support it well and it's a bit tricky in some contexts) branches. The the counters in place and zeroed one can run the test suite and then collect from the counters how many times each line of code was run. There are tools available that can digest the counter data into dashboards displaying how much of each directory was exercised by the tests. One can then drill down through the directory structure to individual source files to see which lines aren't tested, providing guidance about where the tests need improvement.

The Guile programming environment provides a similar capability for Scheme code.

LCOV is a facility written by the Linux Kernel team to visualize test coverage. It's basic but it's also ubiquitous on Linux. It has two programs, geninfo and genhtml, that do the actual work; lcov is a command wrapper. There is a fourth program, c++demangle, that translates C++ "mangled" function names into the names in the source code. CMake requires that they be present; if any are missing it will disable coverage instrumentation. They're generally all provided by a single lcov package that you can install with apt, dnf, pacman, etc.

CMakeLists.txt provides two options, COVERAGE and GUILE_COVERAGE, and associated targets lcov-initialize, lcov-collect, and lcov-generate-html.

COVERAGE and GUILE_COVERAGE are independent: You can enable either or both, but beware: The penalty for COVERAGE is only a few seconds but GUILE_COVERAGE will slow the test suite to well over 10 minutes or longer on a slow machine with few cores. The basic procedure after configuring with COVERAGE or GUILE_COVERAGE enabled and building GnuCash is to zero the counters, do something, collect the counters, and make a website. For example

   ninja lcov-initialize
   ninja check
   ninja lcov-collect
   ninja lcov-generate-html

is how the Github workflow produces the GnuCash coverage pages. Note that you don't necessarily have to run ninja check. If you're working on a single test program you can save considerable time, especially if it's Scheme, by running only that one program: Not only will you not have to wait for everything else that you're not interested in to run, you'll get a smaller, easier to navigate website showing the effects only of your work.

Continuous Integration: Github Actions

We use Github Actions to run the full test suite after every push and for every Github pull request. The Actions are generated by YAML files in .github/workflows in the GnuCash root directory. Individuals with Github forks can easily run the actions on their forks by enabling actions and setting the requisite branches. We run tests on the default current Ubuntu LTS and old-LTS, on an Arch Linux Docker, and on macOS for every push and every pull request to quickly catch breakage. We also run an Asan (see #Address Sanitizer) build and run the test suite to ensure that no memory violations have been added and a #Coverage build to generate a set of web pages assessing our test code coverage. The results of that run are viewable at https://gnucash.github.io/gnucash/Coverage-HTML/. As of this writing they're far from satisfactory.

Github generally notifies pushers by email when actions fail; this is in part controlled by your notification settings, but it's not perfectly reliable. If your commit has a red X next to the sha then it failed one or more tests and you need to look into it. You can also access the complete history of actions on the Actions tab.

For tests that fail a Github action but not on your local development system it can be helpful to keep a VM around that duplicates the build environment.

Policy

  • All new non-GUI code should include thorough unit tests. Automated testing of the GUI tends to be brittle, so GUI modifications should be hand tested in as many OS environments as possible before being committed.
  • make check run from the top build directory should pass before commits are pushed.

Unit Test Policies

  • Getter/Setter functions which only set or retrieve an instance member variable do not need to be tested.
  • Convenience functions which only wrap another function to change the function's name or to provide a default argument do not require testing.
  • Composed functions, or functions which simply string together a series of calls to other functions, need not be tested if the called functions are all tested, have no side effects, and where the composed function has only one flow of control.
  • There is some disagreement among testing gurus about whether a function's parameter variations should be exercised in a single test function or separately in a test function per function call. Use your judgement here. Remember that the dictum of Agile Development is to write a little bit at a time and to refactor as often as you need to. That applies as much to test code as it does to production code. It's OK to change your mind!
  • Similarly there is tension among the gurus about how much to make a test program dependent upon, and how much to use mock objects to replace actual dependency code. Keeping in mind the goal of a short code-compile-test cycle, use your judgement. That said, at present much of Gnucash is rather interdependent and doesn't virtualize functions -- a requirement for applying mocks. If you're writing new modules, do use modern OO techniques to minimize interdependence, and where it's necessary make sure to use virtual functions so that linking the rest of Gnucash isn't needed to test your work.


Writing Tests

Google Test based tests

Tests on C++ code should be written using GoogleTest and GoogleMock.

There are plenty of websites devoted to writing unit tests. There are also a couple of encyclopedic books on the topic, XUnit Test Patterns by George Mezaros and Working Effectively With Legacy Code by Michael Feathers.

Tests should be written in a test subdirectory of the directory containing the implementation file that you're testing. Create one if it doesn't already exist.

Once you've written your test you need to arrange for it to be compiled and run by the build system. The test directory needs a CMakeLists.txt if it doesn't already have one.

There's a handy function, gnc_add_test() that takes care of most of the boilerplate of adding a test to the build system. It takes four arguments: The test name, which must be unique in the whole build system and should be test-file-name where "file-name" is the file containing the functions you're testing; the source files for the test; a variable containing the include paths; and a variable containing the library flags and other cmake targets that the test depends upon. For example, suppose that we want to test foodir/foo.c. This file calls functions in libgncmod-engine and needs to include files from the libgnucash/engine directory. We'd use the following setup in CMakeLists.txt:

 set(test_foo_SOURCES ../foo.c test-foo.c)
 set(test_foo_INCLUDES ${CMAKE_SOURCE_DIR}/libgnucash/engine)
 set(test_foo_LIBS gncmod-engine)
 gnc_add_test(test-foo "${test_foo_SOURCES}" test_foo_INCLUDES test_foo_LIBS)

That gets the test built and run. We also need to make sure that the source files are included in the distribution tarball, so at the bottom of CMakeLists.txt we need:

 set(test_foodir_sources_DIST "${test_foo_SOURCES}")
 set(test_foodir_DIST CMakeLists.txt "${test_foodir_sources_DIST})

And in foodir/CMakeLists.txt:

  set(foodir_DIST "${foodir_local_DIST}" "${test_foodir_DIST}" PARENT_SCOPE)

Finally, don't forget to add all of your new files to git when you commit your test!

GLib-test based tests

Several of our C libraries have unit tests written using the GLib test framework. While GoogleTest is preferred for writing new tests for C++ code it may be useful to write tests using this framework for C code before converting it to C++ to help prevent breaking something. This is still unit testing so the general references above still apply.

To set up unit testing in a directory:

  • Create a "test" directory if there isn't one already
  • Create a CMakeLists.txt in that test directory, again if there isn't one already.
  • The setup of of a GLib test program in CMakeLists.txt is the same as for a GoogleTest test, see above.
  • Copy test-templates/test-module.c to your test directory, rename it, and create a target for it in CMakeLists.txt
  • Run
 test-templates/make-testfile "Your Name <you@your.email.address>" path/to/source

passing the path to the source file you want to write tests for. This will create a template test file for you with all of the necessary functions prototyped and commented out and a populated test suite function with the individual tests commented out.

  • Uncomment the definition for a function that you want to test and write your test function. Write Setup and Teardown functions as needed. Uncomment the execution line for the test function, adjusting the setup and teardown function names as necessary, in the test suite function at the end of the file.

There is a unit test support module with some useful functions for controlling logging and signals in common/test-core. If you use it, add

 test-core

to your test program's LIBS variable or add

 ${CMAKE_SOURCE_DIR}/test-core/unittest-support.c

to the SOURCES variable; in either case add

 #include <unittest-support.h>

in your test-suite file. You have two options for the actual test programs. You can write a bunch of separate programs with a few tests each or you can group several files containing tests into a single program.

Many Little Programs

  • Make a copy of test-templates/testmain.c for each program, renaming it appriopriately, in your test directory.
  • Create fixtures and test functions and register the test functions in main(); there are comments in the file to guide you.
  • Set up a target in Makefile.am as described above for each program.

Test Suites

A test suite is a collection of test functions registered in a test-suite function; main() runs the test-suite functions. This makes it easier to group tests into separate files with a master test program file to contain main(). We'll call the master test program source file the module file; it's conventionally named after the directory it's testing, e.g. test-engine.c. Normally you'll have a test-suite for each source file in the directory named utest-filename.c, e.g. utest-Split.c.

Note that as the C++ conversion progresses it may be necessary to compile some of these files as C++ in which case the extension will be .cpp instead of .c.

Legacy Tests

In addition to GoogleTest and GLib Testing based tests GnuCash has several older tests based on a private framework. Most of these are module tests rather than unit tests and most don't do a good job of localizing test failures, both of which can make debugging test failures difficult. The framework isn't well documented so if you need to get into these tests you'll need to study the source code in test-stuff.h, test-stuff.c, test-engine-stuff.h, and test-engine-stuff.cpp.

Scheme Tests

We want to test Scheme as well. New tests should be written using SRFI-64. "Legacy" tests are just scheme forms, some are unit tests and some are larger integration tests. Until recently there was no standard style or form other than that they had to exit with 0 for success or something else for failure so that ctest can tell which pass and which fail.

Scheme tests are added to CMakeLists.txt with the command gnc_add_scheme_test; multiple tests can be added with gnc_add_scheme_tests. They don't create executables in CMAKE_BUILD_DIRECTORY/bin so to run one by itself use

 ctest -R label-regex

where label-regex is a Regular Expression not a Glob. For example

 ctest -R .*-barchart

will run all of the tests whose names end in -barchart. Adding a -V to the ctest options will send the output to stdout instead of CMAKE_BUILD_DIRECTORY/Testing/Temporary/LastTest.log

Test Coverage Status

See http://gnucash.github.io/gnucash/Coverage-HTML/ for the current status. To dig in navigate to a file you're interested in and click it. The file's source code is displayed with three columns on the right: Line number, Branch Data, and Line data. There's a legend at the top, but it fails to mention that the number in Line data is the number of times that line was called during test suite execution.

The per-push runs don't include Scheme code coverage because generating it is so slow. You can generate a local page with that information, see [#Coverage].

Known Test Needs

  • GtkAction callbacks referenced directly in GtkBuilder UI files need at a minimum "presence testing" so that make check will fail if the callbacks don't compile for some reason.

Literature Survey

  • Test Patterns: Refactoring Test Code The skeleton of this excellent manual for writing and improving unit tests is online; there is a pointer there for purchasing the book as well. While the book focuses on xUnit-style test frameworks (meaning jUnit and its many derivatives) most of the principles and patterns are applicable to any unit test code.