Skip to content

Commit 845cca2

Browse files
authored
Merge pull request #24 from j-andrews7/clastr
CLASTR functionality added
2 parents 1bfd725 + c1d8fcf commit 845cca2

21 files changed

+1832
-599
lines changed

.github/workflows/draft-pdf.yml

Lines changed: 0 additions & 23 deletions
This file was deleted.

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ share/python-wheels/
2525
.installed.cfg
2626
*.egg
2727
MANIFEST
28+
.conda/*
2829

2930
# PyInstaller
3031
# Usually these files are written by a python script from a template
@@ -152,4 +153,5 @@ cython_debug/
152153
#.idea/
153154
.DS_Store
154155

155-
strprofiler.json
156+
strprofiler.json
157+
testing.xlsx

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
# Changelog
22

3+
## v0.3.0
4+
5+
**Release date: 05/30/2024**
6+
7+
- Added ability to query the CLASTR API for single or batch queries from within the STRprofiler
8+
app - [#24](https://github.com/j-andrews7/strprofiler/pull/24).
9+
- Numerous UI tweaks for a more compact experience.
10+
311
## v0.2.0
412

513
**Release date: 04/16/2024**

README.md

Lines changed: 86 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
[![PyPI license](https://img.shields.io/pypi/l/strprofiler.svg)](https://pypi.python.org/pypi/strprofiler/)
99
[![DOI](https://zenodo.org/badge/523477912.svg)](https://zenodo.org/badge/latestdoi/523477912)
1010

11-
**STRprofiler** is a simple python utility to compare short tandem repeat (STR) profiles. In particular, it is designed to aid research labs in comparing models (e.g. cell lines & xenografts) generated from primary tissue samples to ensure contamination has not occurred. It includes basic checks for sample mixing and contamination.
11+
**STRprofiler** is a python package, CLI tool, and Shiny application to compare short tandem repeat (STR) profiles. In particular, it is designed to aid research labs in comparing models (e.g. cell lines & xenografts) generated from primary tissue samples to ensure contamination has not occurred. It includes basic checks for sample mixing and contamination and provides a simple interface to conveniently query the [Cellosaurus database via the CLASTR API](https://www.cellosaurus.org/str-search/).
1212

1313
**STRprofiler is intended only for research purposes.**
1414

@@ -49,24 +49,77 @@ Full usage information can be found by running `strprofiler --help`.
4949

5050
STRprofiler compares STR profiles to each other.
5151

52-
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
53-
│ --tan_threshold -tanth FLOAT Minimum Tanabe score to report as potential matches in summary table. [default: 80] │
54-
│ --mas_q_threshold -masqth FLOAT Minimum Masters (vs. query) score to report as potential matches in summary table. [default: 80] │
55-
│ --mas_r_threshold -masrth FLOAT Minimum Masters (vs. reference) score to report as potential matches in summary table. [default: 80] │
56-
│ --mix_threshold -mix INTEGER Number of markers with >= 2 alleles allowed before a sample is flagged for potential mixing. |
57-
| [default: 3] │
58-
│ --sample_map -sm PATH Path to sample map in csv format for renaming. First column should be sample names as given |
59-
| in STR file(s), second should be new names to assign. No header. │
60-
│ --database -db PATH Path to an STR database file in csv, xlsx, tsv, or txt format. │
61-
│ --amel_col -acol TEXT Name of Amelogenin column in STR file(s). [default: AMEL] │
62-
│ --sample_col -scol TEXT Name of sample column in STR file(s). [default: Sample] │
63-
│ --marker_col -mcol TEXT Name of marker column in STR file(s). Only used if format is 'wide'. [default: Marker] │
64-
│ --penta_fix -pfix Whether to try to harmonize PentaE/D allele spelling. [default: True] │
65-
│ --score_amel -amel Use Amelogenin for similarity scoring. [default: False] │
66-
│ --output_dir -o PATH Path to the output directory. [default: ./STRprofiler] │
67-
│ --version Show the version and exit. │
68-
│ --help Show this message and exit. │
69-
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
52+
╭─ Options ────────────────────────────────────────────────────────────────────────────────╮
53+
│ --tan_threshold -tanth FLOAT Minimum Tanabe score to report as potential matches |
54+
| in summary table. [default: 80] │
55+
│ --mas_q_threshold -masqth FLOAT Minimum Masters (vs. query) score to report as |
56+
| potential matches in summary table. [default: 80] │
57+
│ --mas_r_threshold -masrth FLOAT Minimum Masters (vs. reference) score to report as |
58+
| potential matches in summary table. [default: 80] │
59+
│ --mix_threshold -mix INTEGER Number of markers with >= 2 alleles allowed before |
60+
| a sample is flagged for potential mixing. |
61+
| [default: 3] │
62+
│ --sample_map -sm PATH Path to sample map in csv format for renaming. |
63+
| First column should be sample names as given in |
64+
| STR file(s), second should be new names to assign. |
65+
| No header. │
66+
│ --database -db PATH Path to an STR database file in csv, xlsx, tsv, |
67+
| or txt format. │
68+
│ --amel_col -acol STR Name of Amelogenin column in STR file(s). |
69+
| [default: 'AMEL'] │
70+
│ --sample_col -scol STR Name of sample column in STR file(s). |
71+
| [default: 'Sample'] │
72+
│ --marker_col -mcol STR Name of marker column in STR file(s). |
73+
| Only used if format is 'wide'. [default: 'Marker'] │
74+
│ --penta_fix -pfix FLAG Whether to try to harmonize PentaE/D allele |
75+
| spelling. [default: True] │
76+
│ --score_amel -amel FLAG Use Amelogenin for similarity scoring. |
77+
| [default: False] │
78+
│ --output_dir -o PATH Path to the output directory. |
79+
| [default: ./STRprofiler] │
80+
│ --version Show the version and exit. │
81+
│ --help Show this message and exit
82+
╰──────────────────────────────────────────────────────────────────────────────────────────╯
83+
```
84+
85+
**CLASTR**
86+
87+
Additionally, the [Cellosaurus](https://www.cellosaurus.org/description.html) (Bairoch, 2018) cell line database can be queried via the [CLASTR](https://www.cellosaurus.org/str-search/) (Robin, Capes-Davis, and Bairoch, 2019) [REST API](https://www.cellosaurus.org/str-search/help.html#5).
88+
89+
`clastr -sm "SampleMap_exp.csv" -scol "Sample Name" -o ./strprofiler_output STR1.xlsx STR2.csv STR3.txt`
90+
91+
Full usage information can be found by running `clastr --help`.
92+
93+
```bash
94+
Usage: clastr [OPTIONS] INPUT_FILES...
95+
96+
**clastr** compares STR profiles to the human Cellosaurus knowledge base using the CLASTR REST API.
97+
98+
╭─ Options ────────────────────────────────────────────────────────────────────────────────╮
99+
│ --search_algorithm -sa INT Search algorithm to use in the CLASTR query. |
100+
| 1 - Tanabe, 2 - Masters (vs. query); |
101+
| 3 - Masters (vs. reference) [default: 1] │
102+
│ --scoring_mode -sm INT Search mode to account for missing alleles in query or |
103+
| reference. 1 - Non-empty markers, 2 - Query markers, |
104+
| 3 - Reference markers. [default: 1] │
105+
│ --score_filter -sf INT Minimum score to report as potential matches in |
106+
| summary table. [default: 80] │
107+
│ --max_results -mr INT Filter defining the maximum number of results to be |
108+
| returned. [default: 200] │
109+
│ --min_markers -mm INT Filter defining the minimum number of markers for |
110+
| matches to be reported. [default: 8] │
111+
│ --sample_col -scol STR Name of sample column in STR file(s). |
112+
| [default: 'Sample'] │
113+
│ --marker_col -mcol STR Name of marker column in STR file(s). |
114+
| Only used if format is 'wide'. [default: 'Marker'] │
115+
│ --penta_fix -pfix FLAG Whether to try to harmonize PentaE/D allele spelling. |
116+
| [default: True] │
117+
│ --score_amel -amel FLAG Use Amelogenin for similarity scoring. [default: False] │
118+
│ --output_dir -o PATH Path to the output directory. [default: ./STRprofiler] │
119+
│ --version Show the version and exit. │
120+
│ --help Show this message and exit. │
121+
╰──────────────────────────────────────────────────────────────────────────────────────────╯
122+
70123
```
71124
72125
## Input Files(s)
@@ -149,6 +202,10 @@ In addition to the marker columns, this output contains the following columns:
149202
| **masters_query_score** | Masters (vs query) similarity score. |
150203
| **masters_ref_score** | Masters (vs reference) similarity score. |
151204
205+
**clastr**
206+
207+
Output for `clastr` is provided in XLSX format. Results follow the CLASTR format, documented here: https://www.cellosaurus.org/str-search/help.html#4
208+
152209
## Database Comparison
153210
154211
**STRprofiler** can be also used to compare batches of samples against a larger database of samples.
@@ -163,7 +220,7 @@ New in v0.2.0 is `strprofiler-app`, a command that launches a Shiny application
163220
164221
This application can provide a convenient portal to a group's STR database and can be hosted on standard Shiny servers, Posit Connect instances, or ShinyApps.io.
165222
166-
An example of the application can be seen [here](https://hg99x7-jared0andrews.shinyapps.io/strprofiler/).
223+
An example of the application can be seen [here](https:sj-bakerlab.shinyapps.io/strprofiler/).
167224
168225
### Deploying an `strprofiler` App
169226
@@ -202,8 +259,14 @@ You can contribute by creating [issues](https://github.com/j-andrews7/strprofile
202259
203260
**STRprofiler** is released on the MIT license. You are free to use, modify, or redistribute it in almost any way, provided you state changes to the code, disclose the source, and use the same license. It is released with zero warranty for any purpose and the authors retain no liability for its use. [Read the full license](https://github.com/j-andrews7/strprofiler/blob/master/LICENSE) for additional details.
204261
205-
## Reference
262+
## References
263+
264+
If you use **STRprofiler** in your research, please cite the DOI:
265+
266+
Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.3.0 (v0.3.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
267+
268+
If you use the `clastr` command or functionality from the Shiny application, please cite the Cellosaurus and CLASTR publications:
206269
207-
If you use **strprofiler** in your research, please cite the DOI:
270+
Bairoch A. (2018) The Cellosaurus, a cell line knowledge resource. Journal of Biomolecular Techniques. 29:25-38. DOI: 10.7171/jbt.18-2902-002; PMID: 29805321
208271
209-
Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.2.0 (v0.2.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
272+
Robin, T., Capes-Davis, A. & Bairoch, A. (2019) CLASTR: the Cellosaurus STR Similarity Search Tool - A Precious Help for Cell Line Authentication. International Journal of Cancer. PubMed: 31444973  DOI: 10.1002/IJC.32639

app.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from strprofiler.shiny_app.shiny_app import create_app
2+
3+
app = create_app()

docs/index.rst

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -54,10 +54,19 @@ Usage
5454
.. autofunction:: strprofiler.strprofiler.strprofiler
5555

5656

57+
Querying CLASTR
58+
===============
59+
60+
**STRprofiler** can also be used to directly query CLASTR via their API.
61+
This can be done from within the Shiny application or from the command line via the ``clastr`` command or using the ``clastr_query`` function directly:
62+
63+
.. autofunction:: strprofiler.clastr.clastr_query
64+
5765
Input Files(s)
5866
~~~~~~~~~~~~~~
5967

60-
**STRprofiler** can take either a single STR file or multiple STR files as input. These files can be csv, tsv, tab-separated text, or xlsx (first sheet used) files. The STR file(s) should be in either 'wide' or 'long' format. The long format expects all columns to map to the markers except for the designated sample name column with each row reflecting a different profile, e.g.:
68+
**STRprofiler** can take either a single STR file or multiple STR files as input.
69+
These files can be csv, tsv, tab-separated text, or xlsx (first sheet used) files. The STR file(s) should be in either 'wide' or 'long' format. The long format expects all columns to map to the markers except for the designated sample name column with each row reflecting a different profile, e.g.:
6170

6271
+--------+---------+---------+---------+--------+---------+--------+
6372
| Sample | D1S1656 | DYS391 | D3S1358 | D2S441 | D16S539 | D5S818 |
@@ -139,7 +148,7 @@ The wide format expects a line for each marker for each sample, e.g.:
139148
| Sample2 | FGA | 21 | 294.67 | 11941 | | | | |
140149
+--------------+-----------+-------------+---------+-------------+-------------+---------+-------------+-------------+
141150

142-
In this format, the `marker_col` must be specified. Only columns beginning with "Allele" will be used to parse the alleles for each sample/marker. Any other size or height columns will be ignored.
151+
In this format, the ``marker_col`` must be specified. Only columns beginning with "Allele" will be used to parse the alleles for each sample/marker. Any other size or height columns will be ignored.
143152

144153
Output Files
145154
~~~~~~~~~~~~
@@ -201,16 +210,16 @@ Database Comparison
201210
202211
In this mode, inputs are compared against the database samples only, and not among themselves. Outputs will be as described above for sample input(s).
203212

204-
The `STRprofiler` App
205-
=====================
213+
The ``STRprofiler`` App
214+
=======================
206215

207-
New in v0.2.0 is `strprofiler-app`, a command that launches a Shiny application that allows for user queries against an uploaded or pre-defined database (provided with the `-db` parameter) of STR profiles.
216+
New in v0.2.0 is ``strprofiler-app``, a CLI command that launches a Shiny application that allows for user queries against an uploaded or pre-defined database (provided with the `-db` parameter) of STR profiles.
208217

209218
This application can provide a convenient portal to a group's STR database and can be hosted on standard Shiny servers, Posit Connect instances, or ShinyApps.io.
210219

211-
An example of the application can be seen `here <https://hg99x7-jared0andrews.shinyapps.io/strprofiler/>`__.
220+
An example of the application can be seen `here <https://sj-bakerlab.shinyapps.io/strprofiler/>`__.
212221

213-
Deploying an ``strprofiler`` App
222+
Deploying an ``STRprofiler`` App
214223
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
215224

216225
Building an app for deployment to any of the above options is simple.
@@ -258,13 +267,13 @@ You can contribute by creating `issues <https://github.com/j-andrews7/strprofile
258267
License
259268
=======
260269

261-
**strprofiler** is released on the MIT license. You are free to use, modify, or redistribute it in almost any way, provided you state changes to the code, disclose the source, and use the same license. It is released with zero warranty for any purpose and I retain no liability for its use. `Read the full license <https://github.com/j-andrews7/strprofiler/blob/master/LICENSE>`_ for additional details.
270+
**STRprofiler** is released on the MIT license. You are free to use, modify, or redistribute it in almost any way, provided you state changes to the code, disclose the source, and use the same license. It is released with zero warranty for any purpose and I retain no liability for its use. `Read the full license <https://github.com/j-andrews7/strprofiler/blob/master/LICENSE>`_ for additional details.
262271

263272
Reference
264273
=========
265274

266-
If you use **strprofiler** in your research, please cite the following:
267-
Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.2.0 (v0.2.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
275+
If you use **STRprofiler** in your research, please cite the following:
276+
Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.3.0 (v0.3.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
268277

269278
Indices and tables
270279
==================

docs/requirements.txt

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,8 @@ myst-parser
44
rich-click
55
shiny
66
shinyswatch
7-
faicons
7+
faicons
8+
requests
9+
flatten-json
10+
json
11+
requests

0 commit comments

Comments
 (0)