Skip to content

CLASTR WIP #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
May 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
8a9c4f1
Merge pull request #14 from j-andrews7/dev
j-andrews7 Apr 17, 2024
5961969
clastr api proof of concept
MikeWLloyd May 1, 2024
9916ad9
query row added, UI adjustment
MikeWLloyd May 2, 2024
9bba28f
tooltip added, help updated, req for deploy updated
MikeWLloyd May 2, 2024
1cd2ef4
add window title
MikeWLloyd May 3, 2024
8645320
clastr batch method rough in
MikeWLloyd May 9, 2024
1bd91a9
add requirements, bump version
j-andrews7 May 14, 2024
579a1cf
fix for #26
MikeWLloyd May 16, 2024
77529be
additional tweaks for #26
j-andrews7 May 16, 2024
771c133
add marker check for single query
MikeWLloyd May 16, 2024
815036f
conditional batch options. modal notice for malformed markers.
MikeWLloyd May 16, 2024
3d5d412
global clastr function
MikeWLloyd May 20, 2024
2696ce9
clastr unit test
MikeWLloyd May 20, 2024
3191368
catch non-int thresholds
MikeWLloyd May 21, 2024
27f04c2
fix for #25, docstrings added
MikeWLloyd May 24, 2024
8e1a338
NoneType catch
MikeWLloyd May 24, 2024
e5e4a30
doc updates
MikeWLloyd May 24, 2024
26c2883
Fix #28
j-andrews7 May 28, 2024
38bdb08
remove debug print statement
MikeWLloyd May 28, 2024
4505bd6
linting, minor UI tweaks
j-andrews7 May 29, 2024
2f87d58
more linting
j-andrews7 May 29, 2024
30a3689
Help file typos & formatting
j-andrews7 May 29, 2024
158b2de
update lock file
j-andrews7 May 29, 2024
aed4430
Update CHANGELOG.md
j-andrews7 May 30, 2024
9dd839d
Remove old paper drafts and JOSS workflow
j-andrews7 May 30, 2024
4e940f7
add version, i hate scrolling
j-andrews7 May 30, 2024
c62eaba
doc updates
j-andrews7 May 30, 2024
fdeda54
format README
j-andrews7 May 30, 2024
8f0588d
add CLASTR reference
j-andrews7 May 30, 2024
c1d8fcf
Update README.md
j-andrews7 May 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 0 additions & 23 deletions .github/workflows/draft-pdf.yml

This file was deleted.

4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ share/python-wheels/
.installed.cfg
*.egg
MANIFEST
.conda/*

# PyInstaller
# Usually these files are written by a python script from a template
Expand Down Expand Up @@ -152,4 +153,5 @@ cython_debug/
#.idea/
.DS_Store

strprofiler.json
strprofiler.json
testing.xlsx
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Changelog

## v0.3.0

**Release date: 05/30/2024**

- Added ability to query the CLASTR API for single or batch queries from within the STRprofiler
app - [#24](https://github.com/j-andrews7/strprofiler/pull/24).
- Numerous UI tweaks for a more compact experience.

## v0.2.0

**Release date: 04/16/2024**
Expand Down
109 changes: 86 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
[![PyPI license](https://img.shields.io/pypi/l/strprofiler.svg)](https://pypi.python.org/pypi/strprofiler/)
[![DOI](https://zenodo.org/badge/523477912.svg)](https://zenodo.org/badge/latestdoi/523477912)

**STRprofiler** is a simple python utility to compare short tandem repeat (STR) profiles. In particular, it is designed to aid research labs in comparing models (e.g. cell lines & xenografts) generated from primary tissue samples to ensure contamination has not occurred. It includes basic checks for sample mixing and contamination.
**STRprofiler** is a python package, CLI tool, and Shiny application to compare short tandem repeat (STR) profiles. In particular, it is designed to aid research labs in comparing models (e.g. cell lines & xenografts) generated from primary tissue samples to ensure contamination has not occurred. It includes basic checks for sample mixing and contamination and provides a simple interface to conveniently query the [Cellosaurus database via the CLASTR API](https://www.cellosaurus.org/str-search/).

**STRprofiler is intended only for research purposes.**

Expand Down Expand Up @@ -49,24 +49,77 @@ Full usage information can be found by running `strprofiler --help`.

STRprofiler compares STR profiles to each other.

╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --tan_threshold -tanth FLOAT Minimum Tanabe score to report as potential matches in summary table. [default: 80] │
│ --mas_q_threshold -masqth FLOAT Minimum Masters (vs. query) score to report as potential matches in summary table. [default: 80] │
│ --mas_r_threshold -masrth FLOAT Minimum Masters (vs. reference) score to report as potential matches in summary table. [default: 80] │
│ --mix_threshold -mix INTEGER Number of markers with >= 2 alleles allowed before a sample is flagged for potential mixing. |
| [default: 3] │
│ --sample_map -sm PATH Path to sample map in csv format for renaming. First column should be sample names as given |
| in STR file(s), second should be new names to assign. No header. │
│ --database -db PATH Path to an STR database file in csv, xlsx, tsv, or txt format. │
│ --amel_col -acol TEXT Name of Amelogenin column in STR file(s). [default: AMEL] │
│ --sample_col -scol TEXT Name of sample column in STR file(s). [default: Sample] │
│ --marker_col -mcol TEXT Name of marker column in STR file(s). Only used if format is 'wide'. [default: Marker] │
│ --penta_fix -pfix Whether to try to harmonize PentaE/D allele spelling. [default: True] │
│ --score_amel -amel Use Amelogenin for similarity scoring. [default: False] │
│ --output_dir -o PATH Path to the output directory. [default: ./STRprofiler] │
│ --version Show the version and exit. │
│ --help Show this message and exit. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────╮
│ --tan_threshold -tanth FLOAT Minimum Tanabe score to report as potential matches |
| in summary table. [default: 80] │
│ --mas_q_threshold -masqth FLOAT Minimum Masters (vs. query) score to report as |
| potential matches in summary table. [default: 80] │
│ --mas_r_threshold -masrth FLOAT Minimum Masters (vs. reference) score to report as |
| potential matches in summary table. [default: 80] │
│ --mix_threshold -mix INTEGER Number of markers with >= 2 alleles allowed before |
| a sample is flagged for potential mixing. |
| [default: 3] │
│ --sample_map -sm PATH Path to sample map in csv format for renaming. |
| First column should be sample names as given in |
| STR file(s), second should be new names to assign. |
| No header. │
│ --database -db PATH Path to an STR database file in csv, xlsx, tsv, |
| or txt format. │
│ --amel_col -acol STR Name of Amelogenin column in STR file(s). |
| [default: 'AMEL'] │
│ --sample_col -scol STR Name of sample column in STR file(s). |
| [default: 'Sample'] │
│ --marker_col -mcol STR Name of marker column in STR file(s). |
| Only used if format is 'wide'. [default: 'Marker'] │
│ --penta_fix -pfix FLAG Whether to try to harmonize PentaE/D allele |
| spelling. [default: True] │
│ --score_amel -amel FLAG Use Amelogenin for similarity scoring. |
| [default: False] │
│ --output_dir -o PATH Path to the output directory. |
| [default: ./STRprofiler] │
│ --version Show the version and exit. │
│ --help Show this message and exit │
╰──────────────────────────────────────────────────────────────────────────────────────────╯
```

**CLASTR**

Additionally, the [Cellosaurus](https://www.cellosaurus.org/description.html) (Bairoch, 2018) cell line database can be queried via the [CLASTR](https://www.cellosaurus.org/str-search/) (Robin, Capes-Davis, and Bairoch, 2019) [REST API](https://www.cellosaurus.org/str-search/help.html#5).

`clastr -sm "SampleMap_exp.csv" -scol "Sample Name" -o ./strprofiler_output STR1.xlsx STR2.csv STR3.txt`

Full usage information can be found by running `clastr --help`.

```bash
Usage: clastr [OPTIONS] INPUT_FILES...

**clastr** compares STR profiles to the human Cellosaurus knowledge base using the CLASTR REST API.

╭─ Options ────────────────────────────────────────────────────────────────────────────────╮
│ --search_algorithm -sa INT Search algorithm to use in the CLASTR query. |
| 1 - Tanabe, 2 - Masters (vs. query); |
| 3 - Masters (vs. reference) [default: 1] │
│ --scoring_mode -sm INT Search mode to account for missing alleles in query or |
| reference. 1 - Non-empty markers, 2 - Query markers, |
| 3 - Reference markers. [default: 1] │
│ --score_filter -sf INT Minimum score to report as potential matches in |
| summary table. [default: 80] │
│ --max_results -mr INT Filter defining the maximum number of results to be |
| returned. [default: 200] │
│ --min_markers -mm INT Filter defining the minimum number of markers for |
| matches to be reported. [default: 8] │
│ --sample_col -scol STR Name of sample column in STR file(s). |
| [default: 'Sample'] │
│ --marker_col -mcol STR Name of marker column in STR file(s). |
| Only used if format is 'wide'. [default: 'Marker'] │
│ --penta_fix -pfix FLAG Whether to try to harmonize PentaE/D allele spelling. |
| [default: True] │
│ --score_amel -amel FLAG Use Amelogenin for similarity scoring. [default: False] │
│ --output_dir -o PATH Path to the output directory. [default: ./STRprofiler] │
│ --version Show the version and exit. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────╯

```

## Input Files(s)
Expand Down Expand Up @@ -149,6 +202,10 @@ In addition to the marker columns, this output contains the following columns:
| **masters_query_score** | Masters (vs query) similarity score. |
| **masters_ref_score** | Masters (vs reference) similarity score. |

**clastr**

Output for `clastr` is provided in XLSX format. Results follow the CLASTR format, documented here: https://www.cellosaurus.org/str-search/help.html#4

## Database Comparison

**STRprofiler** can be also used to compare batches of samples against a larger database of samples.
Expand All @@ -163,7 +220,7 @@ New in v0.2.0 is `strprofiler-app`, a command that launches a Shiny application

This application can provide a convenient portal to a group's STR database and can be hosted on standard Shiny servers, Posit Connect instances, or ShinyApps.io.

An example of the application can be seen [here](https://hg99x7-jared0andrews.shinyapps.io/strprofiler/).
An example of the application can be seen [here](https:sj-bakerlab.shinyapps.io/strprofiler/).

### Deploying an `strprofiler` App

Expand Down Expand Up @@ -202,8 +259,14 @@ You can contribute by creating [issues](https://github.com/j-andrews7/strprofile

**STRprofiler** is released on the MIT license. You are free to use, modify, or redistribute it in almost any way, provided you state changes to the code, disclose the source, and use the same license. It is released with zero warranty for any purpose and the authors retain no liability for its use. [Read the full license](https://github.com/j-andrews7/strprofiler/blob/master/LICENSE) for additional details.

## Reference
## References

If you use **STRprofiler** in your research, please cite the DOI:

Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.3.0 (v0.3.0). Zenodo. https://doi.org/10.5281/zenodo.7348386

If you use the `clastr` command or functionality from the Shiny application, please cite the Cellosaurus and CLASTR publications:

If you use **strprofiler** in your research, please cite the DOI:
Bairoch A. (2018) The Cellosaurus, a cell line knowledge resource. Journal of Biomolecular Techniques. 29:25-38. DOI: 10.7171/jbt.18-2902-002; PMID: 29805321

Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.2.0 (v0.2.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
Robin, T., Capes-Davis, A. & Bairoch, A. (2019) CLASTR: the Cellosaurus STR Similarity Search Tool - A Precious Help for Cell Line Authentication. International Journal of Cancer. PubMed: 31444973  DOI: 10.1002/IJC.32639
3 changes: 3 additions & 0 deletions app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from strprofiler.shiny_app.shiny_app import create_app

app = create_app()
29 changes: 19 additions & 10 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,19 @@ Usage
.. autofunction:: strprofiler.strprofiler.strprofiler


Querying CLASTR
===============

**STRprofiler** can also be used to directly query CLASTR via their API.
This can be done from within the Shiny application or from the command line via the ``clastr`` command or using the ``clastr_query`` function directly:

.. autofunction:: strprofiler.clastr.clastr_query

Input Files(s)
~~~~~~~~~~~~~~

**STRprofiler** can take either a single STR file or multiple STR files as input. These files can be csv, tsv, tab-separated text, or xlsx (first sheet used) files. The STR file(s) should be in either 'wide' or 'long' format. The long format expects all columns to map to the markers except for the designated sample name column with each row reflecting a different profile, e.g.:
**STRprofiler** can take either a single STR file or multiple STR files as input.
These files can be csv, tsv, tab-separated text, or xlsx (first sheet used) files. The STR file(s) should be in either 'wide' or 'long' format. The long format expects all columns to map to the markers except for the designated sample name column with each row reflecting a different profile, e.g.:

+--------+---------+---------+---------+--------+---------+--------+
| Sample | D1S1656 | DYS391 | D3S1358 | D2S441 | D16S539 | D5S818 |
Expand Down Expand Up @@ -139,7 +148,7 @@ The wide format expects a line for each marker for each sample, e.g.:
| Sample2 | FGA | 21 | 294.67 | 11941 | | | | |
+--------------+-----------+-------------+---------+-------------+-------------+---------+-------------+-------------+

In this format, the `marker_col` must be specified. Only columns beginning with "Allele" will be used to parse the alleles for each sample/marker. Any other size or height columns will be ignored.
In this format, the ``marker_col`` must be specified. Only columns beginning with "Allele" will be used to parse the alleles for each sample/marker. Any other size or height columns will be ignored.

Output Files
~~~~~~~~~~~~
Expand Down Expand Up @@ -201,16 +210,16 @@ Database Comparison

In this mode, inputs are compared against the database samples only, and not among themselves. Outputs will be as described above for sample input(s).

The `STRprofiler` App
=====================
The ``STRprofiler`` App
=======================

New in v0.2.0 is `strprofiler-app`, a command that launches a Shiny application that allows for user queries against an uploaded or pre-defined database (provided with the `-db` parameter) of STR profiles.
New in v0.2.0 is ``strprofiler-app``, a CLI command that launches a Shiny application that allows for user queries against an uploaded or pre-defined database (provided with the `-db` parameter) of STR profiles.

This application can provide a convenient portal to a group's STR database and can be hosted on standard Shiny servers, Posit Connect instances, or ShinyApps.io.

An example of the application can be seen `here <https://hg99x7-jared0andrews.shinyapps.io/strprofiler/>`__.
An example of the application can be seen `here <https://sj-bakerlab.shinyapps.io/strprofiler/>`__.

Deploying an ``strprofiler`` App
Deploying an ``STRprofiler`` App
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Building an app for deployment to any of the above options is simple.
Expand Down Expand Up @@ -258,13 +267,13 @@ You can contribute by creating `issues <https://github.com/j-andrews7/strprofile
License
=======

**strprofiler** is released on the MIT license. You are free to use, modify, or redistribute it in almost any way, provided you state changes to the code, disclose the source, and use the same license. It is released with zero warranty for any purpose and I retain no liability for its use. `Read the full license <https://github.com/j-andrews7/strprofiler/blob/master/LICENSE>`_ for additional details.
**STRprofiler** is released on the MIT license. You are free to use, modify, or redistribute it in almost any way, provided you state changes to the code, disclose the source, and use the same license. It is released with zero warranty for any purpose and I retain no liability for its use. `Read the full license <https://github.com/j-andrews7/strprofiler/blob/master/LICENSE>`_ for additional details.

Reference
=========

If you use **strprofiler** in your research, please cite the following:
Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.2.0 (v0.2.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
If you use **STRprofiler** in your research, please cite the following:
Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.3.0 (v0.3.0). Zenodo. https://doi.org/10.5281/zenodo.7348386

Indices and tables
==================
Expand Down
6 changes: 5 additions & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,8 @@ myst-parser
rich-click
shiny
shinyswatch
faicons
faicons
requests
flatten-json
json
requests
Loading
Loading