How to implement software engineering best practices in the context of ENCORE? #15
Replies: 7 comments 1 reply
-
Software Engineering toolsA range of tools to assist in software engineering is available. For example: We need to think how to improve software engineering practices in the context of ENCORE (while not enforcing a specific tool) |
Beta Was this translation helpful? Give feedback.
-
EducationEducation/training is important. Nowadays, a lot of (online) workshops/courses/meetings focus on Software Engineering. This may benefit individual researchers and reproducibility. It is not the task of ENCORE to train researchers in software engineering, but ENCORE guidelines can create more awareness. |
Beta Was this translation helpful? Give feedback.
-
Code testingInterrogate specific and isolated coding behaviour to reduce coding errors and ensure intended functionality, especially as code increases in complexity. Describe if software tests have been performed and how to re-run these tests. From IBM: There are many different types of software tests, each with specific objectives and strategies:
Not all of these tests are equally important when it comes to scientific software. For ENCORE we need to decide which tests need to be done at a minimum and how to perform/document these tests. |
Beta Was this translation helpful? Give feedback.
-
Code documentationWrite comments as you code (not afterwards). Modern IDEs can assist in automatically generating documentation strings as you write code, which removes the burden of having to remember to write comments. (e.g., PyCharm, DataSpell from JetBrains (https://www.jetbrains.com/)) Be aware of guidelines and tools to (automatically) generate documentation such as (https://www.sphinx-doc.org) using Python docstrings (https://www.geeksforgeeks.org/python-docstrings), r2readthedocs, and roxygen2 (https://cran.r-project.org/web/packages/roxygen2/index.html) for R. These are useful for improving reproducibility. We used Sphinx for the documentation of the sFSS Navigator. There are different types of documentation:
In bold the documentation that should be provided at a minimum in the context of ENCORE. |
Beta Was this translation helpful? Give feedback.
-
Integrated development environment (IDE)Integrated Development Environments help to improve software. Consider using an IDE, e.g., Visual Studio Code (https://code.visualstudio.com), PyCharm and DataSpell from JetBrains (https://www.jetbrains.com), and RStudio (https://www.rstudio.com) IDEs offer various advantages:
Proposal: For ENCORE we should not impose a (specific) IDE. |
Beta Was this translation helpful? Give feedback.
-
AI-based toolsLarge Language Models (LLMs) increasingly play a role in software development, testing, and documentation (e.g., Copilot, ChatGPT). Copilot is also integrated with GitHub Codespaces. It is worthwhile to try out these new tools. See for example:
|
Beta Was this translation helpful? Give feedback.
-
Scientific Software Engineering |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
The importance of Software Engineering for reproduciblity
Over the past two decades, an increasing number of biomedical researchers have become involved in computational research. Many of these researchers have never been formally trained in scientific computing and software engineering (e.g., design, programming, documentation), software version control, the use of high-performance computing infrastructures, the use of Unix/Linux which is still the major platform for scientific computing, algorithm design, the use of (Jupyter, R) notebooks, etc.
Lack of such skills may negatively affect reliability and transparency of software and, consequently, reproducibility. For example, software may be poorly designed and documented, making it difficult to understand, use, modify, and debug.
One resulting problem is that we have no way of knowing whether the code being used to generate the computational results is doing what the researchers think it is doing. This is one reason why ENCORE proposes to start organizing and documenting from the start of a project, since this increases the chance that conceptual errors or software bugs are detected at an early stage by the researchers themselves or their supervisors.
Software engineering is a discipline in its own and includes the design, implementation, documentation, testing and deployment of software. Following best practices for scripting, functional programming, or objective-oriented programming may significantly improve the quality of the code but requires training and experience. The use of integrated development environments, automated quality checks, and (unit) testing would also help to improve software but also Large Language Models will increasingly play a role in software development, testing, and documentation (e.g., Copilot, ChatGPT).
In addition, software documentation occasionally leaves much to desire. In a recent report, it was concluded that researchers are generally not aware for whom they write documentation and what documentation is required. Currently, ENCORE does not provide specific instructions for coding style (e.g., PEP 8 for Python and tidyverse for R) and documentation design, because it is probably more effective to train scientistsin the art of software engineering. Instead, general guidelines are provided in a README file.
Awareness of guidelines and tools to (automatically) generate documentation such as Sphinx (Brandl, 2021) for Python, and r2readthedocs (r2readthedocs, 2023) and roxygen2 (Roxygen2, 2023) for R, will also help to improve reproducibility. We used Sphinx for the documentation of the sFSS Navigator.
In general, appropriate training on reproducibility approaches could already significantly improve the current situation and will at least create awareness of the tremendous amount of literature about many aspects of sound scientific computing practices.
Beta Was this translation helpful? Give feedback.
All reactions