-
Notifications
You must be signed in to change notification settings - Fork 23
Testing procedure
This page is a guide for Taskforce members who are implementing unit tests for the repository. We focus here on the technical procedure: please first read about the OSIPI Taskforce 2.3 testing approach, including the extent of testing, types of data, selection of reference values etc.
A basic introduction to pytest is provided here.
In a nutshell, the procedure for adding/modifying tests for a specific area of functionality is:
- Create new feature branch
- Add test datasets
- Write new test functions
- Check that tests run locally
- Push to github and submit a pull request to merge with the develop branch
The following sections cover specific aspects of this process. The existing tests for T1 measurement will be used for illustration.
The name of the new feature branch should describe what is being done, e.g. "T1mapping_test_update".
Test files and data should be grouped into a single folder for a specific type of functionality, e.g. test/T1_mapping.
Tests for all code contributions containing this functionality will be put here.
Test functions should be named to reflect the functionality and the origin of the code contribution, e.g. test_t1_ST_SydneyAus.py.
First we need some data. This can be stored in e.g. test/T1_mapping/data.
For this example, we have 3 CSV files corresponding to different datasets: t1_brain_data.csv, t1_prostate_data.csv and t1_quiba_data.csv.
Any suitable (open) format can be used, but we recommend avoiding whole images where possible at this stage of the project.
Each row of the CSV file contains a test case (e.g. 1 voxel or ROI). Each test case should include the following minimum information:
- label identifying each test case, so that test results can be traced back if necessary
- input parameters needed to run the code, e.g. signal(s), concentration(s), acquisition parameters, options etc.
- reference output values to compare against code output
In our test functions, we will used the parametrize decorator. This automatically runs the same test function using multiple sets of input parameters (e.g. multiple voxels). This is very convenient, but we need to format the test data before passing to parametrize. The required format is a list of tuples: each list item is a tuple containing all of the information (label, parameters, reference values etc.) for a specific test case.
For T1 mapping this is achieved in the test/T1_mapping/t1_data.py module.
This contains 3 functions: one for each data set (there is a bit of code repetition here, but in general the datasets may come in different formats). Each function performs the following:
- Documents the source data and reference values as part of the docstring. The information included in this example is recommended as a minimum.
- Reads the test data. We have used pandas here to read the csv-format data into a dataframe.
-
Assign values for each variable to a list of test cases. We may need to convert between units:
r1_ref = (df['R1']*1000.).tolist() # convert /ms to /s
The aim is to make sure that we output the same units for all test datasets. Different code contributions may require parameters in different units, thus further conversion may be necessary within the test functions (see below). - Specify the tolerance(s) for this dataset. We add this information to our list of tuples.
-
Return the dataset as a list of tuples, e.g.
pars = list(zip(label, fa_array, tr_array, s_array, r1_ref, s0_ref, a_tol, r_tol))
return pars
The basic approach is to create a test module file for one of the code contributions, e.g. test/T1_mapping/test_t1_ST_SydneyAus.py
Once this is up and running, additional test functions can be created to test other contributions using the same data, e.g. test_t1_MJT_EdinburghUK.py
Within the module, there will be one or more test function. For example, the T1 code contributions include linear- and non-linear fitting algorithms, thus a test function is provided for each.
Note that test filenames and function names should include "test" in order to be recognised by pytest.
The first step is to define a parametrize decorator that will be used to decorate each test function:
parameters = pytest.mark.parametrize('label, fa_array, tr_array, s_array, r1_ref, s0_ref, a_tol, r_tol',
t1_data.t1_brain_data() +
t1_data.t1_quiba_data() +
t1_data.t1_prostate_data()
)
Note the list of argument names and its order should match both the test function signature and the format of the output data returned by the read-data helpful functions. In this example, we combine all three datasets to generate a single parametrize decorator.
Each test function includes the following steps:
- Function declaration. The name should indicate the functionality, code contribution and sub-functionality being tested. Each test function should be decorated:
@parameters
def test_ST_SydneyAus_t1_VFA_nonlin(label, fa_array, tr_array, s_array, r1_ref, s0_ref, a_tol, r_tol):
- Prepare input data. We cannot guarantee that all code contributions will expect inputs in the same units. It may be necessary to convert some parameters:
tr = tr_array[0] * 1000. # convert s to ms
- Run the code to generate the "measured" output:
[s0_nonlin_meas, t1_nonlin_meas] = VFAT1mapping( fa_array, s_array, tr, method = 'nonlinear' )
- Convert output data (if necessary), so that the "measured" value has the same units as the reference value.
r1_nonlin_meas = 1000./t1_nonlin_meas # convert T1 (ms) to R1 (/s)
- Use an assert statement to compare measured and reference outputs, to determine whether the test will pass:
np.testing.assert_allclose( [r1_nonlin_meas], [r1_ref], rtol=r_tol, atol=a_tol )
Many different assert functions are available. In this example, it "compares the difference between actual and desired to atol + rtol * abs(desired)".
To check that the tests are working locally, open a python console and navigate to the root directory of the repository:
import pytest
pytest.main(["--maxfail=5"])
Pytest has many command line options to control the nature of the output. A "." will be shown for each test case that passes, a "F" for each case that fails. For test failures, further information including the parameters (the "label" variable is helpful here), and measured and expected output values are shown.
TBC