j-andrews7 · j-andrews7 · May 30, 2024 · Apr 17, 2024 · May 1, 2024 · May 2, 2024
diff --git a/.github/workflows/draft-pdf.yml b/.github/workflows/draft-pdf.yml
diff --git a/.gitignore b/.gitignore
@@ -25,6 +25,7 @@ share/python-wheels/
 .installed.cfg
 *.egg
 MANIFEST
+.conda/*
 
 # PyInstaller
 #  Usually these files are written by a python script from a template
@@ -152,4 +153,5 @@ cython_debug/
 #.idea/
 .DS_Store
 
-strprofiler.json
+strprofiler.json
+testing.xlsx
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,13 @@
 # Changelog
 
+## v0.3.0
+
+**Release date: 05/30/2024**
+
+ - Added ability to query the CLASTR API for single or batch queries from within the STRprofiler 
+ app - [#24](https://github.com/j-andrews7/strprofiler/pull/24).
+ - Numerous UI tweaks for a more compact experience.
+
 ## v0.2.0
 
 **Release date: 04/16/2024**

diff --git a/README.md b/README.md
@@ -8,7 +8,7 @@
 [![PyPI license](https://img.shields.io/pypi/l/strprofiler.svg)](https://pypi.python.org/pypi/strprofiler/)
 [![DOI](https://zenodo.org/badge/523477912.svg)](https://zenodo.org/badge/latestdoi/523477912)
 
-**STRprofiler** is a simple python utility to compare short tandem repeat (STR) profiles. In particular, it is designed to aid research labs in comparing models (e.g. cell lines & xenografts) generated from primary tissue samples to ensure contamination has not occurred. It includes basic checks for sample mixing and contamination.
+**STRprofiler** is a python package, CLI tool, and Shiny application to compare short tandem repeat (STR) profiles. In particular, it is designed to aid research labs in comparing models (e.g. cell lines & xenografts) generated from primary tissue samples to ensure contamination has not occurred. It includes basic checks for sample mixing and contamination and provides a simple interface to conveniently query the [Cellosaurus database via the CLASTR API](https://www.cellosaurus.org/str-search/).
 
 **STRprofiler is intended only for research purposes.**
 
@@ -49,24 +49,77 @@ Full usage information can be found by running `strprofiler --help`.
 
  STRprofiler compares STR profiles to each other.  
 
-╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
-│ --tan_threshold    -tanth   FLOAT        Minimum Tanabe score to report as potential matches in summary table. [default: 80]                          │
-│ --mas_q_threshold  -masqth  FLOAT        Minimum Masters (vs. query) score to report as potential matches in summary table. [default: 80]             │
-│ --mas_r_threshold  -masrth  FLOAT        Minimum Masters (vs. reference) score to report as potential matches in summary table. [default: 80]         │
-│ --mix_threshold    -mix     INTEGER      Number of markers with >= 2 alleles allowed before a sample is flagged for potential mixing.                 |
-|                                            [default: 3]                                                                                               │
-│ --sample_map       -sm      PATH         Path to sample map in csv format for renaming. First column should be sample names as given                  |
-|                                            in STR file(s),  second should be new names to assign. No header.                                          │
-│ --database         -db      PATH         Path to an STR database file in csv, xlsx, tsv, or txt format.                                               │
-│ --amel_col         -acol    TEXT         Name of Amelogenin column in STR file(s). [default: AMEL]                                                    │
-│ --sample_col       -scol    TEXT         Name of sample column in STR file(s). [default: Sample]                                                      │
-│ --marker_col       -mcol    TEXT         Name of marker column in STR file(s). Only used if format is 'wide'. [default: Marker]                       │
-│ --penta_fix        -pfix                 Whether to try to harmonize PentaE/D allele spelling. [default: True]                                        │
-│ --score_amel       -amel                 Use Amelogenin for similarity scoring. [default: False]                                                      │
-│ --output_dir       -o       PATH         Path to the output directory. [default: ./STRprofiler]                                                       │
-│ --version                                Show the version and exit.                                                                                   │
-│ --help                                   Show this message and exit.                                                                                  │
-╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
+╭─ Options ────────────────────────────────────────────────────────────────────────────────╮
+│ --tan_threshold    -tanth   FLOAT    Minimum Tanabe score to report as potential matches |
+|                                      in summary table. [default: 80]                     │
+│ --mas_q_threshold  -masqth  FLOAT    Minimum Masters (vs. query) score to report as      |
+|                                      potential matches in summary table. [default: 80]   │
+│ --mas_r_threshold  -masrth  FLOAT    Minimum Masters (vs. reference) score to report as  |
+|                                      potential matches in summary table. [default: 80]   │
+│ --mix_threshold    -mix     INTEGER  Number of markers with >= 2 alleles allowed before  |
+|                                      a sample is flagged for potential mixing.           |
+|                                      [default: 3]                                        │
+│ --sample_map       -sm      PATH     Path to sample map in csv format for renaming.      |
+|                                      First column should be sample names as given in     |
+|                                      STR file(s), second should be new names to assign.  | 
+|                                      No header.                                          │
+│ --database         -db      PATH     Path to an STR database file in csv, xlsx, tsv,     |
+|                                      or txt format.                                      │
+│ --amel_col         -acol    STR      Name of Amelogenin column in STR file(s).           |
+|                                      [default: 'AMEL']                                   │
+│ --sample_col       -scol    STR      Name of sample column in STR file(s).               |
+|                                      [default: 'Sample']                                 │
+│ --marker_col       -mcol    STR      Name of marker column in STR file(s).               |
+|                                      Only used if format is 'wide'. [default: 'Marker']  │
+│ --penta_fix        -pfix    FLAG     Whether to try to harmonize PentaE/D allele         |
+|                                      spelling. [default: True]                           │
+│ --score_amel       -amel    FLAG     Use Amelogenin for similarity scoring.              |
+|                                      [default: False]                                    │
+│ --output_dir       -o       PATH     Path to the output directory.                       |
+|                                     [default: ./STRprofiler]                             │
+│ --version                            Show the version and exit.                          │
+│ --help                               Show this message and exit                          │
+╰──────────────────────────────────────────────────────────────────────────────────────────╯
+```
+
+**CLASTR**
+
+Additionally, the [Cellosaurus](https://www.cellosaurus.org/description.html) (Bairoch, 2018) cell line database can be queried via the [CLASTR](https://www.cellosaurus.org/str-search/) (Robin, Capes-Davis, and Bairoch, 2019) [REST API](https://www.cellosaurus.org/str-search/help.html#5).  
+
+`clastr -sm "SampleMap_exp.csv" -scol "Sample Name" -o ./strprofiler_output STR1.xlsx STR2.csv STR3.txt`
+
+Full usage information can be found by running `clastr --help`.
+
+```bash
+ Usage: clastr [OPTIONS] INPUT_FILES...   
+
+**clastr** compares STR profiles to the human Cellosaurus knowledge base using the CLASTR REST API.
+
+╭─ Options ────────────────────────────────────────────────────────────────────────────────╮
+│ --search_algorithm  -sa    INT  Search algorithm to use in the CLASTR query.             |
+|                                 1 - Tanabe, 2 - Masters (vs. query);                     |
+|                                 3 - Masters (vs. reference) [default: 1]                 │
+│ --scoring_mode      -sm    INT  Search mode to account for missing alleles in query or   |
+|                                 reference. 1 - Non-empty markers, 2 - Query markers,     |
+|                                 3 - Reference markers. [default: 1]                      │
+│ --score_filter      -sf    INT  Minimum score to report as potential matches in          |
+|                                 summary table. [default: 80]                             │
+│ --max_results       -mr    INT  Filter defining the maximum number of results to be      |
+|                                 returned. [default: 200]                                 │
+│ --min_markers       -mm    INT  Filter defining the minimum number of markers for        |
+|                                 matches to be reported. [default: 8]                     │
+│ --sample_col        -scol  STR  Name of sample column in STR file(s).                    |
+|                                 [default: 'Sample']                                      │
+│ --marker_col        -mcol  STR  Name of marker column in STR file(s).                    |
+|                                 Only used if format is 'wide'. [default: 'Marker']       │
+│ --penta_fix         -pfix  FLAG Whether to try to harmonize PentaE/D allele spelling.    |
+|                                 [default: True]                                          │
+│ --score_amel        -amel  FLAG Use Amelogenin for similarity scoring. [default: False]  │
+│ --output_dir        -o     PATH Path to the output directory. [default: ./STRprofiler]   │
+│ --version                       Show the version and exit.                               │
+│ --help                          Show this message and exit.                              │
+╰──────────────────────────────────────────────────────────────────────────────────────────╯
+
 ```
 
 ## Input Files(s)
@@ -149,6 +202,10 @@ In addition to the marker columns, this output contains the following columns:
 | **masters_query_score** | Masters (vs query) similarity score.                         |
 | **masters_ref_score**   | Masters (vs reference) similarity score.                     |
 
+**clastr**
+
+Output for `clastr` is provided in XLSX format. Results follow the CLASTR format, documented here: https://www.cellosaurus.org/str-search/help.html#4
+
 ## Database Comparison
 
 **STRprofiler** can be also used to compare batches of samples against a larger database of samples. 
@@ -163,7 +220,7 @@ New in v0.2.0 is `strprofiler-app`, a command that launches a Shiny application
 
 This application can provide a convenient portal to a group's STR database and can be hosted on standard Shiny servers, Posit Connect instances, or ShinyApps.io. 
 
-An example of the application can be seen [here](https://hg99x7-jared0andrews.shinyapps.io/strprofiler/).
+An example of the application can be seen [here](https:sj-bakerlab.shinyapps.io/strprofiler/).
 
 ### Deploying an `strprofiler` App
 
@@ -202,8 +259,14 @@ You can contribute by creating [issues](https://github.com/j-andrews7/strprofile
 
 **STRprofiler** is released on the MIT license. You are free to use, modify, or redistribute it in almost any way, provided you state changes to the code, disclose the source, and use the same license. It is released with zero warranty for any purpose and the authors retain no liability for its use. [Read the full license](https://github.com/j-andrews7/strprofiler/blob/master/LICENSE) for additional details.
 
-## Reference
+## References
+
+If you use **STRprofiler** in your research, please cite the DOI:
+
+Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.3.0 (v0.3.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
+
+If you use the `clastr` command or functionality from the Shiny application, please cite the Cellosaurus and CLASTR publications:
 
-If you use **strprofiler** in your research, please cite the DOI:
+Bairoch A. (2018) The Cellosaurus, a cell line knowledge resource. Journal of Biomolecular Techniques. 29:25-38. DOI: 10.7171/jbt.18-2902-002; PMID: 29805321 
 
-Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.2.0 (v0.2.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
+Robin, T., Capes-Davis, A. & Bairoch, A. (2019) CLASTR: the Cellosaurus STR Similarity Search Tool - A Precious Help for Cell Line Authentication. International Journal of Cancer. PubMed: 31444973  DOI: 10.1002/IJC.32639
diff --git a/app.py b/app.py
@@ -0,0 +1,3 @@
+from strprofiler.shiny_app.shiny_app import create_app
+
+app = create_app()
diff --git a/docs/index.rst b/docs/index.rst
@@ -54,10 +54,19 @@ Usage
 .. autofunction:: strprofiler.strprofiler.strprofiler
 
 
+Querying CLASTR
+===============
+
+**STRprofiler** can also be used to directly query CLASTR via their API. 
+This can be done from within the Shiny application or from the command line via the ``clastr`` command or using the ``clastr_query`` function directly:
+
+.. autofunction:: strprofiler.clastr.clastr_query
+
 Input Files(s)
 ~~~~~~~~~~~~~~
 
-**STRprofiler** can take either a single STR file or multiple STR files as input. These files can be csv, tsv, tab-separated text, or xlsx (first sheet used) files. The STR file(s) should be in either 'wide' or 'long' format. The long format expects all columns to map to the markers except for the designated sample name column with each row reflecting a different profile, e.g.:
+**STRprofiler** can take either a single STR file or multiple STR files as input. 
+These files can be csv, tsv, tab-separated text, or xlsx (first sheet used) files. The STR file(s) should be in either 'wide' or 'long' format. The long format expects all columns to map to the markers except for the designated sample name column with each row reflecting a different profile, e.g.:
 
 +--------+---------+---------+---------+--------+---------+--------+
 | Sample | D1S1656 |  DYS391 | D3S1358 | D2S441 | D16S539 | D5S818 | 
@@ -139,7 +148,7 @@ The wide format expects a line for each marker for each sample, e.g.:
 | Sample2      |  FGA      | 21          | 294.67  | 11941       |             |         |             |             |
 +--------------+-----------+-------------+---------+-------------+-------------+---------+-------------+-------------+
 
-In this format, the `marker_col` must be specified. Only columns beginning with "Allele" will be used to parse the alleles for each sample/marker. Any other size or height columns will be ignored.
+In this format, the ``marker_col`` must be specified. Only columns beginning with "Allele" will be used to parse the alleles for each sample/marker. Any other size or height columns will be ignored.
 
 Output Files
 ~~~~~~~~~~~~
@@ -201,16 +210,16 @@ Database Comparison
 
 In this mode, inputs are compared against the database samples only, and not among themselves. Outputs will be as described above for sample input(s).
 
-The `STRprofiler` App
-=====================
+The ``STRprofiler`` App
+=======================
 
-New in v0.2.0 is `strprofiler-app`, a command that launches a Shiny application that allows for user queries against an uploaded or pre-defined database (provided with the `-db` parameter) of STR profiles.
+New in v0.2.0 is ``strprofiler-app``, a CLI command that launches a Shiny application that allows for user queries against an uploaded or pre-defined database (provided with the `-db` parameter) of STR profiles.
 
 This application can provide a convenient portal to a group's STR database and can be hosted on standard Shiny servers, Posit Connect instances, or ShinyApps.io. 
 
-An example of the application can be seen `here <https://hg99x7-jared0andrews.shinyapps.io/strprofiler/>`__.
+An example of the application can be seen `here <https://sj-bakerlab.shinyapps.io/strprofiler/>`__.
 
-Deploying an ``strprofiler`` App
+Deploying an ``STRprofiler`` App
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Building an app for deployment to any of the above options is simple.
@@ -258,13 +267,13 @@ You can contribute by creating `issues <https://github.com/j-andrews7/strprofile
 License
 =======
 
-**strprofiler** is released on the MIT license. You are free to use, modify, or redistribute it in almost any way, provided you state changes to the code, disclose the source, and use the same license. It is released with zero warranty for any purpose and I retain no liability for its use. `Read the full license <https://github.com/j-andrews7/strprofiler/blob/master/LICENSE>`_ for additional details.
+**STRprofiler** is released on the MIT license. You are free to use, modify, or redistribute it in almost any way, provided you state changes to the code, disclose the source, and use the same license. It is released with zero warranty for any purpose and I retain no liability for its use. `Read the full license <https://github.com/j-andrews7/strprofiler/blob/master/LICENSE>`_ for additional details.
 
 Reference
 =========
 
-If you use **strprofiler** in your research, please cite the following:
-Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.2.0 (v0.2.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
+If you use **STRprofiler** in your research, please cite the following:
+Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.3.0 (v0.3.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
 
 Indices and tables
 ==================

diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -4,4 +4,8 @@ myst-parser
 rich-click
 shiny
 shinyswatch
-faicons
+faicons
+requests
+flatten-json
+json
+requests
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		from strprofiler.shiny_app.shiny_app import create_app

		app = create_app()