Skip to content

Releases: Integrative-Transcriptomics/MUSIAL

MUSIAL-v2.4.1

23 May 12:58
77ce6bc
Compare
Choose a tag to compare

v2.4.1 (Minor Update, 23.05.2025)

  • Implementation of HDBSCAN* clustering of alleles and proteoforms of features (https://tribuo.org/ library): After inference of the allele and proteoform sequences per sample, these are now clustered using the Tribuo library's HDBSCAN* algorithm to increase the interpretability of the data, i.e. samples that fall into the same clusters in terms of features can be considered similar even if they do not have the exact same set of variants in terms of features. Clustering is done using L1 distance based on binary features represented by all available variants (position & alternative content) of the feature - this means in particular that clustering is not stable across different sets of variants.
  • The clustering results are used to generate informative names for alleles and proteoforms: these names have been adapted to be used in the different output formats.
  • Improved naming convention for output files.

MUSIAL-v2.4

19 May 13:42
Compare
Choose a tag to compare

v.2.4.0

  • Abstraction and object orientation of the project structure has been improved: Reusable functions were restructured as extendable classes; only locally used functionality was outsourced to internal classes; redundant or unusable code was deleted.
  • Additional runtime and memory optimization: In particular, VCF processing and sequence export have been improved through capacity estimation and improved algorithms.
  • Full review of the code documentation: JavaDoc is now hosted on a GitHub Pages website.
  • Update of SnpEff to the latest version (5.2f 2025-02-07).
  • Improved CLI logging and user information as well as input parameter processing and description.
  • New functions added: Feature validation compliant to the SO. Reference-free processing (without any reference sequence).

MUSIAL-v2.3.10

28 Mar 14:25
Compare
Choose a tag to compare

v.2.3.0

  • Added container classes to increase abstraction.
  • Increased efficiency of computations and memory usage.
  • All variant calls are now stored in the build output, including reference. This reduces reference bias by rejecting calls.
  • Simplified input parameters (call quality is no longer considered).
  • Removed deprecated/redundant code.

v2.3.1 (Minor Update, 21.05.2024)

  • Improved compatibility with bcftools called variants.
  • Switched to gzip for compression to remove OS dependent errors.
  • Removed deprecated/redundant code.

v2.3.2 (Minor Update, 03.06.2024)

  • Added option to the build task to specify temporary working directory used by SnpEff.
  • Rejected variants are now indicated by a ? symbol instead of !.

v2.3.3 (Minor Update, 05.06.2024)

  • The option -u of the build task can now be used to write uncompressed storage files.
  • Improved resolution of complex InDels.
  • Removed skipping of non-haploid samples, still, all input files will be considered as haploid.

v2.3.4 (Minor Update, 14.06.2024)

  • Bugfix for aminoacid variant inference.

v2.3.5 (Minor Update, 26.06.2024)

  • Bugfix for aminoacid and nucleotide variant inference, i.e., in some scenarios InDels were treated as reference calls and ignored in downstream processing.

v2.3.6 (Minor Update, 28.06.2024)

  • Bugfix for processing deletions that exceed feature lengths during proteoform inference.

v2.3.7 (Minor Update, 10.07.2024)

  • Add option to exclude, in addition to positions, explicit variants from the analysis (to tackle reference errors).

v2.3.8 (Minor Update, 26.07.2024)

  • Reference allele/proteoform is now stored independent of occurrence in samples.
  • Separator symbol ";" replaced by ",".
  • Extended nucleotide-variant storage logic to comprise variants that pass the filter criteria but are not the most frequent in one sample:
    • Nucleotide variants stored for a feature are, in this sense, either (i) the most frequent allele passing filter criteria, (ii) the most frequent allele failing filter criteria, i.e., an ambiguous call, or (iii) not the most frequent allele, but passing filter criteria, i.e., non-primary variants.
    • This is reflected by a primary attribute in the variant information.
    • Sequence and table export still only consider primary/most frequent variants.
    • Primary variants (derived of the most frequent allele of a call) that do not pass filter criteria are considered ambiguous variants and are now stored as SNVs with content N instead of Ns in the length of possible InDels to reduce bias towards actual variants.
    • Actual variant content is now stored for ambiguous variants.

v2.3.9 (Minor Update, 16.08.2024)

  • Bug fixes:
    • The .fai index files for the reference sequence are now overwritten on each run to avoid supposed errors due to changes in the file contents.
    • All do-while loops have been replaced by while loops to avoid errors due to empty iterators; this only happened in the case that no reference allele was present in the sample input.
    • Fixed an error in the logical operation for filtering variants.
  • SnpEff now annotates ambiguous variants, i.e. filtered variants, with respect to their actual alternative nucleotide content and extended the annotation to variants that are not on coding genes.

v2.3.10 (Minor Update, 10.09.2024)

  • SnpEff has been updated to the latest version and minor parameter changes have been made to increase the efficiency of the SnpEff runtime.