The Software Documentation Service (SDS) is a tool designed to display the software available on different HPC systems.
For HPC end users it is a way to easily determine which software are available on which clusters as well as any relevant documentation, software type classification, example use, descriptions, etc.
For HPC admins, it is an easy way to provide documentation for software on their systems. All they need to do is provide the names of the software and which cluster they are available on and the SDS tool will provide the rest.
Bash or Zsh terminals are recommended
Windows machine: Install Miniconda or Anaconda manually
The SDS tool is meant to exist independent of any HPC clusters and should not be run on any critical systems. The recommended method is to run the application inside of a VM and copy/add any important information to it.
- Clone the repo into your local machine:
git clone -b stand-alone --single-branch https://github.com/access-ci-org/SDS-Public/tree/stand-alone.git
- Follow instructions to set Config Variables
- Follow instructions in Data Preparation to properly provide data for SDS
That should be all the necessary setup
- Make sure docker is installed on your machine
- Make sure you have the relevant data available based on Data Preparation
- Run
sudo docker compose up -d
- The website will be available in 5 or so seconds at
localhost:8080
(and <your_ip_address>:8080) - You can stop the services by running
sudo docker compose down
- To rebuild the image each time:
sudo docker compose up -d --build
- The website will be available in 5 or so seconds at
- If you want to enable ssl certificates for your website, make the following changes:
- In the
nginx.conf
file, comment out the entrie firstserver {
entry and uncomment the entire secondserver {
entry. - In the
docker-compose.yml
file, comment out the- "808:80"
line and uncomment the- "443:443"
- In the
docker-compose.yml
uncomment the# - ./ssl:/etc/nginx/ssl
- This expects the ssl certificates to be in the project directory. If your ssl certificates are somehwere else,
change the
./ssl
portion to be the path to the directory where the certificates are stored.
- This expects the ssl certificates to be in the project directory. If your ssl certificates are somehwere else,
change the
- In the
- Run
source setup.sh
to setup your environment - Run
python reset_database.py
to create and load your database
- You can pass in three different arguments to
reset_database.py
. Typepython reset_database.py -h
for more info. Read the entire help message before continuing.
- To exclude some softwares from being displayed on the website, add them to the
software_blacklist.txt
file in the project directory. A basic list of blacklisted names is already provided - Run the application with
flask run
- If you want to run the application with a watcher (which will automatically update the db and app when the data is updated). Run
python run.py
seepython run.py --help
for more information
- Create a
config.yaml
file in the project folder. - If you would like to use the SDS api to obtain and display more information about your software then request and api key from Sandesh (sla302@uky.edu).
- Inside the
config.yaml
file, add the following:
api:
use_api: False
api_key: "your api key here"
use_curated_info: False
use_ai_info: False
styles:
primary_color: "your primary color here"
secondary_color: "your secondary color"
site_title: "Title for website here"
logo: "logo file name"
general:
user_name: default admin user
password: default admin password
share_software: False
View the CONFIGS.md
file for information on what these configs do and other avaialbe configs.
The SDS tool requires the names of the software available on each system. You can provide this information in two different ways: curated and/or raw output.
- Curated data should be in the form of a CSV with the following requirements
- If you are using docker, then the file must be named
software.csv
- The first line must have column names and the following columns are necessary (only the software column needs any data): software, resource, software_description, software_versions.
resource
in this case refers to a specific cluster- Here is an example of a CSV file:
- If you are using docker, then the file must be named
software,software_description,software_versions,resource
ACTC,ACTC converts independent triangles into triangle strips or fans.,1.1,cluster1
ACTC,ACTC converts independent triangles into triangle strips or fans., 1.3,cluster1
ANTLR,,"2.7.7-Java-11,2.6",cluster2
A software.csv
file with just the columns is already provided.
For obtaining the raw output, use the collector.py
script located in this repo.
The COLLECTOR.md
file goes over how to use it. The collector.py
file will create the proper directory structure for each type of data.
If you would rather collect the data manually, the rest of the section will cover how to format that data.
The raw output of a specific command or supported file types (SDS will parse it and extract the software info)
All files for this section must be within subdirectories. The name of each subdirectory
should be the name of a resource to which the files belong. resource
refers to a specific cluster.
module spider
output (lmod)- If you use lmod for managing packages/environments then run
module spider
on your system and save the output to a text file - If you are using docker to run the SDS, then the parent directory must be named
spider_data
- If you use lmod for managing packages/environments then run
- Container definition (
.def
) file- You can also provide container definition files within the proper resource directory
and the SDS tool will attempt to parse it and extract any relevant software information.
The name of the
.def
is treated as the container name. - If you are using the version of sds docker, then the parent directory must be named
container_data
- Aside from the raw
.def
file, you can also add curated information for specific containers in a csv file or in a custom SDS comment block (see thePARSER.md
file). All csv files must have a software_name and (container_file or definition_file) columns. Here is the complete list of supported columns:software_name, software_versions, container_name, definition_file, container_file, notes, command
.- If no container_name is provided then the definition_file name will be used as container name.
The csv file is meant to supplement the
.def
files so that you can provide data the parser may have missed, or provide extra information for specific containers (such asnotes
on how to run them)
- If no container_name is provided then the definition_file name will be used as container name.
The csv file is meant to supplement the
- You can also provide only the
.csv
file,.def
files or both. Information will only be added to and not overwritten. - Here is an example
.csv
file or a container:
- You can also provide container definition files within the proper resource directory
and the SDS tool will attempt to parse it and extract any relevant software information.
The name of the
software_name,software_versions,container_name,definition_file,container_file,container_description,notes,command
adapterremoval,2.3.2,,/share/singularity/adapterremoval.def,/share/singularity/adapterremoval.sinf,singularity run --app adapterremoval232 /share/singularity/share/singularity/afterqc.sinf AdapterRemoval
afterqc,0.9.7,,/share/singularity/afterqc,/share/singularity/afterqc.sinf,singularity run --app afterqc097 /share/singularity/share/singularity/afterqc.sinf python /usr/local/Miniconda3/envs/afterqc-0.9.7/bin/after.py -1 R1.fq.gz
Here is an example of a proper directory structure for the data:
SDS
├──container_data/
| └── {resource_name}/
| ├── {resource_name}.csv # CSV file with container metadata
| └── {preserved_directory_structure}/
| └── {definition_files} # Original definition files with paths preserved
|
└──spider_data/
| └── {resource_name}/
| └── {resource_name}_spider.txt # Complete output from module spider
|
└──software.csv
Note for container files: If you are defining your container file/definition file location by using the SDS comment block (see PARSER.md file), you do not need to have a preserved directory structure. So your container_data directory structure would be like this,container_data/{resource_name}/{definition_files}
The built in parser will attempt to gather software information based on the data provided, but it may not always be successful depending on your naming scheme.
You can define how the built in parser parses your information. View the PARSER.md
file for more details on standard formats and how to modify the parsers.