Skip to content

Testing procedure

mjt320 edited this page Aug 26, 2022 · 8 revisions

This page is a guide for Taskforce members who are implementing unit tests for the repository. We focus here on the technical procedure: please first read about the OSIPI Taskforce 2.3 testing approach, including the extent of testing, types of data, selection of reference values etc.

A basic introduction to pytest is provided here.

Tests should be contributed using Git, as described here.

Technical procedure for developing tests

In a nutshell, the procedure for adding/modifying tests for a specific area of functionality is:

  • Create new feature branch
  • Add test datasets
  • Write new test functions, including results logging
  • Check that tests run locally
  • Push to Github and submit a pull request to merge with the develop branch.
  • Request a "review" of the pull request from one of the task force co-leads and from anyone else as required.

The following sections cover specific aspects of this process. The existing tests for T1 measurement will be used for illustration.

Location and naming

The name of the new feature branch should describe what is being done, e.g. "T1mapping_test_update". Test files and data should be grouped into a single folder for a specific type of functionality, e.g. test/T1_mapping.
Tests for all code contributions containing this functionality will live here. Test file and method names should be named to reflect the functionality and the origin of the code contribution, e.g. test_t1_ST_SydneyAus.py.

Adding the test data

First we need some data. This can be stored in e.g. test/T1_mapping/data.
For this example, we have 3 CSV files corresponding to different datasets: t1_brain_data.csv, t1_prostate_data.csv and t1_quiba_data.csv.
Any suitable (open) format can be used, but we recommend avoiding whole images where possible at this stage of the project.

Each row of the CSV file contains a test case (e.g. 1 voxel or ROI). Each test case should include the following minimum information:

  • label identifying each test case, so that test results can be traced back if necessary
  • input parameters needed to run the code, e.g. signal(s), concentration(s), acquisition parameters, options etc.
  • reference output values to compare against code output

Reading the test data

In our test functions, we will use the parametrize decorator. This automatically runs the same test function using multiple sets of input parameters (e.g. multiple voxels). This is very convenient, but we need to format the test data before passing to parametrize. The required format is a list of tuples: each list item is a tuple containing all of the information (label, parameters, reference values etc.) for a specific test case.

For T1 mapping this is achieved in the test/T1_mapping/t1_data.py module.
This contains 3 functions: one for each data set (there is a bit of code repetition here, but in general the datasets may come in different formats). Each function performs the following:

  • Documents the source data and reference values as part of the docstring. The information included in this example is recommended as a minimum.
  • Reads the test data. We have used pandas here to read the csv-format data into a dataframe.
  • Assign values for each variable to a list of test cases. We may need to convert between units:
    r1_ref = (df['R1']*1000.).tolist() # convert /ms to /s
    The aim is to make sure that we output the same units for all test datasets. Different code contributions may require parameters in different units, thus further conversion may be necessary within the test functions (see below).
  • Specify the tolerance(s) for this dataset. We add this information to our list of tuples.
  • Return the dataset as a list of tuples, e.g.
    pars = list(zip(label, fa_array, tr_array, s_array, r1_ref, s0_ref, a_tol, r_tol))
    return pars

Using a separate module to read and parse the test data reduces repetition. Alternatively, we could read/parse the data in each test file, but since all T1 contributions are tested using the same data, this is unnecessary.

Writing a test module

The basic approach is to create a test module file for one of the code contributions, e.g. test/T1_mapping/test_t1_ST_SydneyAus.py
Once this is up and running, additional test functions can be created to test other contributions using the same data, e.g. test_t1_MJT_EdinburghUK.py

Within the test module, there will be one or more test functions. For example, the T1 code contributions include linear- and non-linear fitting algorithms, thus a test function is provided for each. Note that test filenames and function names should include "test" in order to be recognised by pytest.

The first step is to import pytest, helper functions and the test cases:

import pytest
import numpy as np
from ..helpers import osipi_parametrize # helper for running tests on multiple cases
from . import t1_data # module for providing all t1 test cases
from src.original.ST_USydAUS_DCE.VFAT1mapping import VFAT1mapping # functions to test

Then get the test data and create a string specifying the argument names of the test functions. The list of argument names matches both the test function signature and the order in which parameters are returned by the t1_data functions. In this example, we combine all three datasets to make a single list of test cases:

arg_names = 'label, fa_array, tr_array, s_array, r1_ref, s0_ref, a_tol, r_tol'
test_data = (
    t1_data.t1_brain_data() +
    t1_data.t1_quiba_data() +
    t1_data.t1_prostate_data()
    )

We're now ready to define the test functions.

Each test function includes the following steps:

  • Apply parametrize decorator using the variables specified above as arguments. This tells pytest to run the test function with each test case in the list. Using the xf_labels argument we can mark any test cases that are expected to fail by listing their labels.
@osipi_parametrize(arg_names, test_data, xf_labels = ['Pat5_voxel5_prostaat'])
  • Function declaration. The name should indicate the functionality, code contribution and sub-functionality being tested:
def test_ST_SydneyAus_t1_VFA_lin(label, fa_array, tr_array, s_array, r1_ref, s0_ref, a_tol, r_tol):
  • Prepare input data. We cannot guarantee that all code contributions will expect inputs in the same units. It may be necessary to convert some parameters:
tr = tr_array[0] * 1000. # convert s to ms
  • Run the code to generate the "measured" output:
[s0_nonlin_meas, t1_nonlin_meas] = VFAT1mapping( fa_array, s_array, tr, method = 'nonlinear' )
  • Convert output data (if necessary), so that the "measured" value has the same units as the reference value.
r1_nonlin_meas = 1000./t1_nonlin_meas # convert T1 (ms) to R1 (/s) 
  • Assert statement to compare measured and reference outputs, to determine whether the test will pass:
np.testing.assert_allclose( [r1_nonlin_meas], [r1_ref], rtol=r_tol, atol=a_tol )

Many different assert functions are available. In this example, it "compares the difference between measured and reference to atol + rtol * abs(reference)".

Logging the results

In addition to the pass/fail result from pytest, we want to visualize the results from the tests (i.e. the output of the contributed code vs reference values). For this purpose, the reference and output values are saved in a csv file. To each test file the following steps need to be added:

Initialize the log file where a prefix for the filename is determined. For this we use the same notation as the src/test file notations. In addition, the column names need to be specified.

filename_prefix = ''


def setup_module(module):
    # initialize the logfiles
    global filename_prefix # we want to change the global variable
    os.makedirs('./test/results/DCEmodels', exist_ok=True)
    filename_prefix = 'DCEmodels/TestResults_models'
    log_init(filename_prefix, '_MJT_UoEdinburghUK_2CUM', ['label', 'time (us)', 'vp_ref', 'fp_ref', 'ps_ref', 'vp_meas', 'fp_meas', 'ps_meas'])

log results (before the assert statement)

    # log results
    log_results(filename_prefix, '_MJT_UoEdinburghUK_2CUM', [
        [label, f"{exc_time:.0f}", vp_ref, fp_ref, ps_ref, vp_meas, fp_meas, ps_meas]])

The execution time can be measured with the perf-counter. This needs to be added before and after the actual code has been run. However, so far we haven't really used this information.

    # run code
    tic = perf_counter()
    pk_pars, C_t_fit = dce_fit.conc_to_pkp(C_t_array, pk_model)
    vp_meas = pk_pars['vp']
    fp_meas = pk_pars['fp']
    ps_meas = pk_pars['ps']
    exc_time = 1e6 * (perf_counter() - tic)  # measure execution time

Running the tests locally

To check that the tests are working locally, open a python console and navigate to the root directory of the repository:

import pytest
pytest.main(["-k ST_SydneyAus", "--maxfail=5"])

Pytest has many command line options to control the execution and output. For example, the "-k" flag can be used to run a subset of the tests, to reduce the execution time. A "." will be shown for each test case that passes and a "F" for each case that fails. For test failures, further information including the parameters (the "label" variable is helpful here), and measured and expected output values are shown.

Dealing with test failures

It is likely that some of the tests will fail initially. For example, due to noisy data, algorithm differences etc.
All of the test cases should either pass or be marked as expect failures when the new tests are merged into develop.
How to deal with test failures will depend on the functionality being tested. We suggest the following approach:

  • Adjust the tolerances appropriate to the test data (e.g. range of values, noise level). The tester (with help from other taskforce members) will need to make a scientific judgement as to what constitutes an acceptable tolerance.
  • If a small number of cases continue to fail, try to find the reason. Depending on the reason, either mark the cases as expected failures (e.g. poor accuracy is expected for this particular combination of parameters) or remove cases from the test data (e.g. poor quality data).
  • If the tested code continues to fail "unreasonably", raise a github issue and/or liaise with the code contributor to discuss potential causes.
  • Document specific failures and general issues under the test function declaration:
@osipi_parametrize(arg_names, test_data, xf_labels = ['Pat5_voxel5_prostaat'])
def test_ST_SydneyAus_t1_VFA_lin(label, fa_array, tr_array, s_array, r1_ref, s0_ref, a_tol, r_tol):
    # NOTES:
    #   Signal is scaled to prevent multiple test failures for prostate test cases.
    #   Expected fails: 1 low-SNR prostate voxel
Clone this wiki locally