Skip to content

Commit 808c640

Browse files
committed
Update code documentation.
1 parent 62a0acf commit 808c640

16 files changed

+108
-280
lines changed

src/main/java/datastructure/Attributable.java

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,6 @@
1515
* as key-value pairs in a {@link TreeMap}, allowing efficient retrieval, addition, extension, and removal
1616
* of attributes. It also supports operations like checking for the existence of attributes and converting
1717
* attributes to a string representation.
18-
*
19-
* @noinspection unused
2018
*/
2119
public class Attributable {
2220

src/main/java/datastructure/Contig.java

Lines changed: 0 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,9 @@
1414
* This class models a segment of a reference sequence, which can represent a complete genome,
1515
* a plasmid, a single contig, or a scaffold. It extends the {@link Attributable} class to
1616
* inherit functionality for managing attributes associated with the contig.
17-
* </p>
1817
* <p>
1918
* Each instance of this class is uniquely identified by its {@code name} and contains
2019
* information about its nucleotide sequence, variants, and other relevant properties.
21-
* </p>
2220
*/
2321
public class Contig extends Attributable {
2422

@@ -28,7 +26,6 @@ public class Contig extends Attributable {
2826
* This field uniquely identifies the contig within the context of the application.
2927
* It is a final field, meaning its value is immutable once assigned during the
3028
* construction of the {@link Contig} instance.
31-
* </p>
3229
*/
3330
public final String name;
3431

@@ -38,10 +35,8 @@ public class Contig extends Attributable {
3835
* This field stores the nucleotide sequence of the contig. The sequence is expected to be
3936
* stored as a GZIP-compressed string to optimize storage. It may be empty or null if no
4037
* sequence is available for the contig.
41-
* </p>
4238
* <p>
4339
* <b>Note:</b> The sequence is not validated against the variants stored in the {@code variants} map.
44-
* </p>
4540
*/
4641
protected final String sequence;
4742

@@ -55,11 +50,9 @@ public class Contig extends Attributable {
5550
* <li>The third level (value: {@link VariantInformation}) contains additional information about the variant,
5651
* such as associations with {@link SequenceType}s or {@link Sample}s.</li>
5752
* </ul>
58-
* </p>
5953
* <p>
6054
* This structure allows storage and retrieval of variant data, enabling queries by position,
6155
* alternative content, and associated metadata.
62-
* </p>
6356
*/
6457
protected final TreeMap<Integer, Map<String, VariantInformation>> variants;
6558

@@ -69,11 +62,9 @@ public class Contig extends Attributable {
6962
* This field is a transient {@link HashMap} used to cache subsequences of the contig's sequence.
7063
* The keys in the map are {@link Tuple} objects representing the start and end positions of the subsequence,
7164
* and the values are the corresponding subsequences as {@link String}.
72-
* </p>
7365
* <p>
7466
* The cache is transient because it is not intended to be serialized, as it is dynamically populated
7567
* during runtime to optimize performance by avoiding redundant sequence decompression or retrieval.
76-
* </p>
7768
*/
7869
protected transient HashMap<Tuple<Integer, Integer>, String> sequenceCache;
7970

@@ -84,7 +75,6 @@ public class Contig extends Attributable {
8475
* initializes the {@link #variants} map to store variant information and the {@link #sequenceCache}
8576
* map to cache subsequences for optimized retrieval. The sequence is expected to be stored
8677
* as a GZIP-compressed string to reduce storage requirements.
87-
* </p>
8878
*
8979
* @param name The name or identifier of the contig.
9080
* @param sequence The nucleotide sequence of the contig, stored as a GZIP-compressed string.
@@ -103,7 +93,6 @@ protected Contig(String name, String sequence) {
10393
* This method determines whether the contig has a stored sequence by checking
10494
* if the {@code sequence} field is not empty. A non-empty sequence indicates
10595
* that the contig has an associated nucleotide sequence.
106-
* </p>
10796
*
10897
* @return {@code true} if the contig has a sequence (i.e., the sequence length is not zero),
10998
* {@code false} otherwise.
@@ -117,7 +106,6 @@ public boolean hasSequence() {
117106
* <p>
118107
* This method decompresses the GZIP-compressed sequence stored in the {@code sequence} field
119108
* and returns it as a string. If no sequence is stored, it returns an empty string.
120-
* </p>
121109
*
122110
* @return The decompressed nucleotide sequence of this contig, or an empty string if no sequence is stored.
123111
* @throws IOException If an error occurs during the decompression of the sequence.
@@ -136,12 +124,9 @@ public String getSequence() throws IOException {
136124
* specified start and end positions. The subsequence is cached to avoid redundant decompression
137125
* and substring operations for the same range. If the subsequence is already cached, it is
138126
* retrieved directly from the cache. Otherwise, it is computed, stored in the cache, and returned.
139-
* </p>
140-
*
141127
* <p>
142128
* The start and end positions are 1-based indices, meaning the first nucleotide in the sequence
143129
* is at position 1. If no sequence is stored for the contig, the method returns an empty string.
144-
* </p>
145130
*
146131
* @param start The 1-based indexed start position of the subsequence (inclusive).
147132
* @param end The 1-based indexed end position of the subsequence (exclusive).
@@ -169,7 +154,6 @@ public String getSubsequence(int start, int end) throws IOException {
169154
* This method iterates through the hierarchical map of variants stored in the {@code variants} field.
170155
* It computes the total count by summing up the sizes of all inner maps, where each inner map represents
171156
* the alternative sequences for a specific position on the contig.
172-
* </p>
173157
*
174158
* @return The total number of variants located on this contig.
175159
*/
@@ -184,7 +168,6 @@ public int getVariantsCount() {
184168
* for a variant located at the specified position with the given alternative bases. The returned
185169
* {@link VariantInformation} contains details about the variant, including its occurrences in
186170
* samples and features, as well as any associated attributes.
187-
* </p>
188171
*
189172
* @param position The 1-based position of the variant on the contig.
190173
* @param alternativeBases The alternative base sequence of the variant.
@@ -203,14 +186,11 @@ public VariantInformation getVariantInformation(int position, String alternative
203186
* <li>The position of the variant (field {@code a} of the tuple).</li>
204187
* <li>The alternate allele of the variant (field {@code b} of the tuple).</li>
205188
* </ul>
206-
* </p>
207-
*
208189
* <p>
209190
* For each variant, the method retrieves the associated {@link VariantInformation} using
210191
* the position and alternate allele. It then checks if the {@code Constants.EFFECTS} attribute
211192
* is present. If the attribute is found, its value (a comma-separated string of effects) is split
212193
* into individual effects, which are trimmed and aggregated into a {@link Set} to ensure uniqueness.
213-
* </p>
214194
*
215195
* @param variants A list of {@link Tuple} objects representing the variants. Each tuple contains:
216196
* <ul>
@@ -236,7 +216,6 @@ public Set<String> getVariantsEffects(List<Tuple<Integer, String>> variants) {
236216
* <li>The position of the variant on the contig.</li>
237217
* <li>The alternative base sequence of the variant.</li>
238218
* </ul>
239-
* </p>
240219
*
241220
* @return An {@link ArrayList} of {@link Tuple} objects, where each tuple contains
242221
* the position and alternative base sequence of a variant.
@@ -261,7 +240,6 @@ public ArrayList<Tuple<Integer, String>> getVariants() {
261240
* <li>The position of the variant on the contig.</li>
262241
* <li>The alternative base sequence of the variant.</li>
263242
* </ul>
264-
* </p>
265243
*
266244
* @param start The 1-based indexed inclusive start position of the range.
267245
* @param end The 1-based indexed inclusive end position of the range.
@@ -289,7 +267,6 @@ public ArrayList<Tuple<Integer, String>> getVariantsByLocation(int start, int en
289267
* <li>The position of the variant on the contig.</li>
290268
* <li>The alternative base sequence of the variant.</li>
291269
* </ul>
292-
* </p>
293270
*
294271
* @param feature The feature to filter variants by.
295272
* @param alleleUids A set of allele unique identifiers to filter variants by.
@@ -320,7 +297,6 @@ public ArrayList<Tuple<Integer, String>> getVariantsByAlleles(Feature feature, S
320297
* <li>The position of the variant on the contig.</li>
321298
* <li>The alternative base sequence of the variant.</li>
322299
* </ul>
323-
* </p>
324300
*
325301
* @param sampleName The name of the sample to filter variants by.
326302
* @return An {@link ArrayList} of {@link Tuple} objects, where each tuple contains
@@ -349,7 +325,6 @@ public ArrayList<Tuple<Integer, String>> getVariantsBySample(String sampleName)
349325
* <li>The position of the variant on the contig.</li>
350326
* <li>The alternative base sequence of the variant.</li>
351327
* </ul>
352-
* </p>
353328
*
354329
* @param sampleName The name of the sample to filter variants by.
355330
* @param start The 1-based indexed inclusive start position of the location range.

src/main/java/datastructure/Feature.java

Lines changed: 1 addition & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,9 @@
2121
* This class models a genomic feature, such as a gene, exon, or coding sequence (CDS),
2222
* that is analyzed in the context of genomic data processing. It extends the {@link Attributable}
2323
* class to inherit functionality for managing attributes associated with the feature.
24-
* </p>
2524
* <p>
2625
* Each instance of this class is uniquely identified by its {@code name} and contains
2726
* information about its type, location on the reference genome, and other relevant properties.
28-
* </p>
2927
*/
3028
public class Feature extends Attributable {
3129

@@ -35,11 +33,9 @@ public class Feature extends Attributable {
3533
* This field specifies the type of the feature, such as "gene", "exon", or "CDS".
3634
* The type is defined according to the GFF3 specification and provides information
3735
* about the biological or functional classification of the feature.
38-
* </p>
3936
* <p>
4037
* For more details, refer to the GFF3 specification:
4138
* <a href="https://gmod.org/wiki/GFF3">https://gmod.org/wiki/GFF3</a>.
42-
* </p>
4339
*/
4440
public final String type;
4541

@@ -49,7 +45,6 @@ public class Feature extends Attributable {
4945
* This field uniquely identifies the feature within the context of the analysis.
5046
* It is a final field, meaning its value is immutable once assigned during the
5147
* construction of the {@link Feature} instance.
52-
* </p>
5348
*/
5449
public final String name;
5550

@@ -85,7 +80,6 @@ public class Feature extends Attributable {
8580
* a genomic feature. This class extends {@link SequenceType} and provides
8681
* functionality for managing alleles, including their unique identifiers,
8782
* variants, and attributes.
88-
* </p>
8983
*/
9084
public class Allele extends SequenceType {
9185

@@ -117,7 +111,6 @@ private Allele(String uid, List<Tuple<Integer, String>> variants) {
117111
* from the genomic feature. This class extends {@link SequenceType} and provides
118112
* functionality for managing proteoforms, including their unique identifiers,
119113
* variants, and attributes.
120-
* </p>
121114
*/
122115
public class Proteoform extends SequenceType {
123116

@@ -149,12 +142,10 @@ private Proteoform(String uid, List<Tuple<Integer, String>> variants) {
149142
* specific sequence variations of the feature. The keys in the map are unique
150143
* identifiers for the alleles, and the values are the corresponding {@link Allele}
151144
* instances.
152-
* </p>
153145
* <p>
154146
* Alleles are used to track and manage sequence variations resulting from genomic
155147
* changes. Each allele is linked to its unique identifier and contains information
156148
* about its sequence and attributes.
157-
* </p>
158149
*/
159150
protected final HashMap<String, Allele> alleles = new HashMap<>();
160151

@@ -165,11 +156,9 @@ private Proteoform(String uid, List<Tuple<Integer, String>> variants) {
165156
* specific sequence variants of proteins derived from the feature. The keys in the map
166157
* are unique identifiers for the proteoforms, and the values are the corresponding
167158
* {@link Proteoform} instances.
168-
* </p>
169159
* <p>
170160
* Proteoforms are only relevant for coding features and are used to track and manage
171161
* protein sequence variations resulting from genomic changes.
172-
* </p>
173162
*/
174163
protected final HashMap<String, Proteoform> proteoforms = new HashMap<>();
175164

@@ -179,7 +168,6 @@ private Proteoform(String uid, List<Tuple<Integer, String>> variants) {
179168
* This constructor initializes a genomic feature with its name, location, strand orientation,
180169
* type, and unique identifier. The feature's start and end positions are converted to integers
181170
* to ensure proper indexing. The {@link Attributable} superclass is also initialized.
182-
* </p>
183171
*
184172
* @param name The name of the feature, used as its internal identifier.
185173
* @param contig The name of the reference location (e.g., contig, chromosome, plasmid) where the feature is located.
@@ -302,9 +290,8 @@ public Collection<Allele> getAlleles() {
302290
* @throws MusialException If an error occurs during sequence alignment or translation.
303291
* @throws IllegalArgumentException If the allele does not exist, the contig does not match the feature's contig,
304292
* the contig lacks a sequence, the feature is not coding, or the variants map is empty.
305-
* @noinspection UnusedReturnValue
306293
*/
307-
protected String updateProteoform(Contig contig, String alleleUid) throws IOException, MusialException {
294+
protected void updateProteoform(Contig contig, String alleleUid) throws IOException, MusialException {
308295
// Check if the allele UID is valid.
309296
Allele allele = getAllele(alleleUid);
310297
if (allele == null)
@@ -403,8 +390,6 @@ protected String updateProteoform(Contig contig, String alleleUid) throws IOExce
403390

404391
// Associate the allele with the proteoform.
405392
allele.setAttribute(Constants.$Allele_proteoform, proteoformUid);
406-
407-
return proteoformUid;
408393
}
409394

410395
/**

src/main/java/datastructure/Sample.java

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@
1414
* This class extends {@link Attributable} to inherit functionality for managing attributes.
1515
* It provides fields and methods to store and manipulate variant calls, alleles, and other
1616
* sample-specific data. Each instance of this class is uniquely identified by its {@code name}.
17-
* </p>
1817
*/
1918
public class Sample extends Attributable {
2019

@@ -24,7 +23,6 @@ public class Sample extends Attributable {
2423
* This field uniquely identifies the sample within the context of the application.
2524
* It is a final field, meaning its value is immutable once assigned during the
2625
* construction of the {@link Sample} instance.
27-
* </p>
2826
*/
2927
public final String name;
3028

@@ -49,7 +47,6 @@ public class Sample extends Attributable {
4947
* The format follows the <a href="https://samtools.github.io/hts-specs/VCFv4.2.pdf">VCFv4.2</a> specification.
5048
* </li>
5149
* </ul>
52-
* </p>
5350
*/
5451
protected final HashMap<String, TreeMap<Integer, String>> variantCalls = new HashMap<>(2);
5552

@@ -59,7 +56,6 @@ public class Sample extends Attributable {
5956
* This {@link Map} stores the relationship between feature names and their associated allele identifiers.
6057
* The keys represent the names of the features, and the values represent the unique identifiers of the alleles.
6158
* This structure is used to track which allele is associated with each feature in the sample.
62-
* </p>
6359
*/
6460
protected final Map<String, String> alleles;
6561

@@ -77,7 +73,6 @@ public class Sample extends Attributable {
7773
* <li>{@code REF_1:ALT_1:AD_1:PL_1,...}: One or more alternate alleles, each with their respective
7874
* reference allele, alternate allele, allele depth, and phred-scaled likelihoods.</li>
7975
* </ul>
80-
* </p>
8176
*
8277
* <pre>
8378
* Example match: {@code 1;13;99;TTC:.:0:585,TTC:T--:13:0}
@@ -92,7 +87,6 @@ public class Sample extends Attributable {
9287
* This constructor initializes a {@link Sample} object with the given name and allocates a {@link HashMap}
9388
* for the {@link #alleles} field with the specified initial capacity. The {@link #name} field is set to the
9489
* provided name, and the superclass constructor is invoked to initialize inherited properties.
95-
* </p>
9690
*
9791
* @param name The name of the sample, used as its unique identifier.
9892
* @param capacity The expected initial capacity of the {@link #alleles} map.
@@ -109,7 +103,6 @@ protected Sample(String name, int capacity) {
109103
* This method updates the {@link #alleles} map by setting the sequence type (allele)
110104
* for the specified feature. The feature is identified by its name, and the allele
111105
* is identified by its unique identifier.
112-
* </p>
113106
*
114107
* @param featureName The name of the feature ({@link Feature#name}) to associate with the allele.
115108
* @param alleleUid The unique identifier of the allele ({@link SequenceType#name}) to set for the feature.
@@ -124,7 +117,6 @@ protected void setAllele(String featureName, String alleleUid) {
124117
* This method looks up the allele identifier associated with the given feature name
125118
* in the {@link #alleles} map. If no allele is set for the specified feature, the method
126119
* returns the default reference value defined in {@link Constants#synonymous}.
127-
* </p>
128120
*
129121
* @param featureName The name of the feature ({@link Feature#name}) to retrieve the associated allele identifier for.
130122
* @return The allele identifier ({@link Feature.Allele#_uid}) associated with the feature, or the reference value if not set.
@@ -139,7 +131,6 @@ public String getAllele(String featureName) {
139131
* This method returns a collection view of the mappings contained in the {@link #alleles} map.
140132
* Each entry in the collection represents a feature name and its associated allele identifier.
141133
* Modifications to the returned collection will reflect in the underlying map.
142-
* </p>
143134
*
144135
* @return A {@link Collection} of {@link Map.Entry} objects representing the entries in the {@link #alleles} map.
145136
*/
@@ -154,7 +145,6 @@ public Collection<Map.Entry<String, String>> getAlleles() {
154145
* The keys in the map represent the positions of the variants on the contig, and the values
155146
* are the corresponding variant call strings. If no variant calls exist for the specified
156147
* contig, an empty {@link TreeMap} is returned.
157-
* </p>
158148
*
159149
* @param contig The name of the contig to retrieve the variant calls for.
160150
* @return A {@link TreeMap} where the keys are variant positions and the values are variant call strings.
@@ -170,7 +160,6 @@ public TreeMap<Integer, String> getVariantCalls(String contig) {
170160
* the reference base character from the starting position. The call string is expected to follow
171161
* the structure defined in {@link Sample#variantCallPattern}, where fields are separated by semicolons,
172162
* commas, and colons.
173-
* </p>
174163
*
175164
* <pre>
176165
* Example call string: {@code 1;13;99;TTC:.:0:585,TTC:T--:13:0}
@@ -191,7 +180,6 @@ public static String getReferenceOfCall(String call) {
191180
* The attributes are formatted as key-value pairs separated by an equals sign (`=`) and delimited
192181
* by semicolons (`;`). If the last character of the generated string is a semicolon, it is removed
193182
* to ensure proper formatting.
194-
* </p>
195183
*
196184
* @return A {@link String} representing the sample, including its name and attributes.
197185
*/

0 commit comments

Comments
 (0)