Fix typos (#358)

LeonHafner · web-flow · commit 00f2d9a7674c · 2025-04-22T05:11:41.000+02:00
* Update ir_profiling.ipynb

* Update ir_profiling.ipynb

* fix typos
diff --git a/jupyter-book/air_repertoire/clonotype.ipynb b/jupyter-book/air_repertoire/clonotype.ipynb
@@ -31,7 +31,7 @@
     "(air-repertoire-clonotype-key-takeaway-2)=\n",
     "## Gene segment usage and spectratype\n",
     "\n",
-    "The process shaping a T-cell or B-cell receptor by rearrangement of the V(D)J segments is thinking to generate random sequences and, in consequence, the distribution of V(D)J sequences should follow a uniform distribution. Nevertheless, it has been observed that V(D)J gene usage frequency is largely consistent across different individuals, which suggests a preference selection in terms of the V(D)J gene segments used {cite}`elhanati2014quantifying`. That allows the analysis of gene segment usage in terms of abundance of most used gene segments per cell type and frequency of most abundant segment per cell type per individual {cite}`chernyshev2021vdj`. Likewise, considering we know the amino aicd composition of the immune receptors for each cell, it is possible to identify the exact combinations of V(D)J segments of interest.\n",
+    "The process shaping a T-cell or B-cell receptor by rearrangement of the V(D)J segments is thinking to generate random sequences and, in consequence, the distribution of V(D)J sequences should follow a uniform distribution. Nevertheless, it has been observed that V(D)J gene usage frequency is largely consistent across different individuals, which suggests a preference selection in terms of the V(D)J gene segments used {cite}`elhanati2014quantifying`. That allows the analysis of gene segment usage in terms of abundance of most used gene segments per cell type and frequency of most abundant segment per cell type per individual {cite}`chernyshev2021vdj`. Likewise, considering we know the amino acid composition of the immune receptors for each cell, it is possible to identify the exact combinations of V(D)J segments of interest.\n",
     "\n",
     "On the other hand, the recombination of V(D)J gene segments and the imprecise junction of V and J segments produce CDR3 regions with variable lengths. Spectratype analysis is seen as the measurement of the heterogeneity of CDR3 regions by their length diversity across the different cell types {cite}`ciupe2013quantification`. This measurement, in combination with clonal expansion and gene segment usage provides pieces of evidence to define well-described immunodominant clonotypes."
    ]
@@ -52,7 +52,7 @@
     "Here, as well as in the pre-processing step, we will use the utilities from the *Scirpy* library to perform the analysis and locate the results in the *AnnData* object.\n",
     "\n",
     ":::{warning}\n",
-    "Scirpy changed the format of [its datastructure](https://scirpy.scverse.org/en/latest/data-structure.html#storing-airr-rearrangement-data-in-anndata)\n",
+    "Scirpy changed the format of [its data structure](https://scirpy.scverse.org/en/latest/data-structure.html#storing-airr-rearrangement-data-in-anndata)\n",
     "with v0.13. While the overall analysis workflow has not changed, some outputs shown in this chapter might not be accurate anymore. \n",
     "\n",
     "See [the scirpy release notes](https://scirpy.scverse.org/en/latest/changelog.html#v0-13-0-new-data-structure-based-on-awkward-arrays) for more details about this change. \n",
@@ -354,7 +354,7 @@
    "source": [
     "The way to visualize the results is through a network where each node represents a clonotype (cluster of cells), and its size represents the number of cells detected in that cluster. They are labeled with a numerical ID, however, the order is given randomly, and it is not showing any additional information beyond to identify clonotypes of interest.\n",
     "\n",
-    "To generate the network, it is necessary to establish the layout to be plotted afterward. This parameter should be one of the igraph library layouts. Furthermore, it is recommended to set at least *min_cells* >=2 to avoid overcrowding the plot with singletons (clonotypes with only one cell as a member). Here, this parameter is established as >= 50 to show just the biggest clonotypes to easier the observation of the expected result."
+    "To generate the network, it is necessary to establish the layout to be plotted afterward. This parameter should be one of the igraph library layouts. Furthermore, it is recommended to set at least *min_cells* >=2 to avoid overcrowding the plot with singletons (clonotypes with only one cell as a member). Here, this parameter is established as >= 50 to show just the biggest clonotypes and make the observation of the expected result easier."
    ]
   },
   {
@@ -374,7 +374,7 @@
    "source": [
     "Now it is possible to plot the network. The result is just like the one you can observe below. As we said previously, each node (circle) represents a clonotype with a unique number as ID. Furthermore, the size represents the number of cells belonging to each specific clonotype.\n",
     "\n",
-    "On the other hand, we set the color according to the samples to observe if a clonotype appears in two or more samples, those clonotypes are called *public clonotypes* and are of high interest due to they represent shared immunological responses, and therefore they are candidates to explain general response over the disease/phenotype under study. Otherwise, there are *privet clonotypes* which represent patient/sample specific clonal response, and it could be interesting for analysis regarding personalized medicine. As you can see below, the highest clonotypes are composed just of private clonotypes."
+    "On the other hand, we set the color according to the samples to observe if a clonotype appears in two or more samples, those clonotypes are called *public clonotypes* and are of high interest as they represent shared immunological responses, and therefore they are candidates to explain general response over the disease/phenotype under study. Otherwise, there are *private clonotypes* which represent patient/sample specific clonal response, and it could be interesting for analysis regarding personalized medicine. As you can see below, the highest clonotypes are composed just of private clonotypes."
    ]
   },
   {
@@ -2484,7 +2484,7 @@
     "\n",
     "We have identified key expanded clonotypes and the isotype they represented. In addition, we can explore spectratype to observe the dominance in terms of sequence length. As well as in the previous analysis, we discarded the multi-chain cells, and we conserved those clonotypes whose sizes were higher than 50 cells to keep the analysis consistency.\n",
     "\n",
-    "The plot below shown an interesting behaviour, despite the clear spectratype dominance reflected in our previous TCR analysis. Here, two sequence lengths raised, the first and the most dominant conformed by sequences of 23 aminoacids, and the second one composed by 15 aminoacids."
+    "The plot below shows an interesting behaviour, despite the clear spectratype dominance reflected in our previous TCR analysis. Here, two sequence lengths raised, the first and the most dominant conformed by sequences of 23 amino acids, and the second one composed by 15 amino acids."
    ]
   },
   {
@@ -2578,7 +2578,7 @@
    "id": "ec74f74a",
    "metadata": {},
    "source": [
-    "We can observe a clear dominance of the aminoacid proportion in almost all the positions, excepting for a couple of them we some additional aminoacids share the sequence motif landscape. However, the sequence composition for those V segments of interest with length 15 is quite stable.\n",
+    "We observe a clear dominance of certain amino acids at most positions, except for a few where additional amino acids contribute to the sequence motif landscape. Nonetheless, the sequence composition for the V segments of interest with a length of 15 remains relatively stable.\n",
     "\n",
     "![](../_static/images/air_repertoire/bcr_logo_motif.svg)\n",
     "\n",
diff --git a/jupyter-book/air_repertoire/ir_profiling.ipynb b/jupyter-book/air_repertoire/ir_profiling.ipynb
@@ -91,7 +91,7 @@
     "\n",
     "### Immune receptor sequencing\n",
     "\n",
-    "A common approach to discern V(D)J chains from single-cell isolations consist on computational reconstructions of different chains sequences based on full-length single-cell RNA sequencing, being Smart-seq2, a 5'-end RNA template based protocol, one of the widest implemented. Regarding computational methods, TRAPeS, TraCer, and VDJPuzzle are usually used to reconstruct TCR sequences based on scRNA-seq data, whereas BALDR {cite}`Upadhyay2018`, BASIC {cite}`canzar2016` and BraCer {cite}`Lindeman2018` were shown to robustly recover BCR sequences. However, they are prone to ignore the whole landscape of recombinatorial products and alternative splicing products in V(D)J region. Some alternatives have rise to deal with this problematic, RAGE-seq for example was developed to capture specific TCR and BCR fragments based on PCR templates designed for immune receptor sequencing and use long-read Oxford Nanopore to capture the whole sequence, whereas the rest of the cDNA is processed based on short-reads protocols provided by, for example, Illumina {cite}`singh2019high`. "
+    "A common approach to discern V(D)J chains from single-cell isolations consists of computational reconstructions of different chains' sequences based on full-length single-cell RNA sequencing, with Smart-seq2, a 5'-end RNA template based protocol, being one of the most widely implemented. Regarding computational methods, TRAPeS, TraCer, and VDJPuzzle are usually used to reconstruct TCR sequences based on scRNA-seq data, whereas BALDR {cite}`Upadhyay2018`, BASIC {cite}`canzar2016` and BraCer {cite}`Lindeman2018` were shown to robustly recover BCR sequences. However, they are prone to ignore the whole landscape of recombinatorial products and alternative splicing products in V(D)J region. Some alternatives have arisen to deal with this problem, RAGE-seq for example was developed to capture specific TCR and BCR fragments based on PCR templates designed for immune receptor sequencing and use long-read Oxford Nanopore to capture the whole sequence, whereas the rest of the cDNA is processed based on short-reads protocols provided by, for example, Illumina {cite}`singh2019high`. "
    ]
   },
   {
@@ -102,7 +102,7 @@
     "## AIR repertoire analysis\n",
     "\n",
     "VDJ-sequencing provides us with the nucleotide and thereby also the protein sequence of the AIR paired for both chains, from which the V-, (D-,) J-, and C-gene is determined in addition to the CDR3 sequence. Overall, the AIR sequence determines the specificity of the individual B- and T-cell. Therefore, the information obtained by VDJ-sequencing provides us with an indicator of the cells' functionality, which is directly coupled to the AIRs target antigen. This enables us to use the AIR information in three major ways:\n",
-    "- **Phenotyping**: We can group immune cells by identifying cells with the same or similar AIR, which share the same specificity. Having these groups, we can now observe, how disease-specific cells react under different conditions (e.g. transcriptomic change upon stimulation), whether immune cells have proliferated, or how the diversity of an immune repertoire changes upon after an immune response.\n",
+    "- **Phenotyping**: We can group immune cells by identifying cells with the same or similar AIR, which share the same specificity. Having these groups, we can now observe, how disease-specific cells react under different conditions (e.g. transcriptomic change upon stimulation), whether immune cells have proliferated, or how the diversity of an immune repertoire changes after an immune response.\n",
     "- **Sequence Analysis**: Having identified groups of AIRs (e.g. a reactive cluster detected in other modalities), we can extract properties of their sequence, such as V-, D-, and J-, gene usage or enriched sequence motifs, that are related to specific diseases or therapies.\n",
     "- **Specificity-Inference**: Last, we can use the sequence to match AIRs to their target antigen via database queries, sequence distances, or predictors. This directly identifies cells reactive to specific infectious diseases, tumors, or self-antigens. \n"
    ]
@@ -260,7 +260,7 @@
    "metadata": {},
    "source": [
     "### Raw data\n",
-    "We begin by with viewing the raw output of the cell ranger pipeline for a better understanding of the data we are working with.\n",
+    "We begin by viewing the raw output of the cell ranger pipeline for a better understanding of the data we are working with.\n",
     "We will load the `filtered_contig_annotations.csv\"` file to view its content. Each row will represent one measurement of a sequence."
    ]
   },
@@ -3033,9 +3033,9 @@
      "text": [
       "Amount of all B cells:\t\t\t\t159446\n",
       "Amount of B cells with AIR:\t\t\t159185\n",
-      "Amount of B cells without dublets:\t\t159185\n",
+      "Amount of B cells without doublets:\t\t159185\n",
       "Amount of B cells with unique AIR per cell:\t153936\n",
-      "Amount of B cells with sinlge complete AIR:\t108395\n"
+      "Amount of B cells with single complete AIR:\t108395\n"
      ]
     }
    ],
@@ -3045,7 +3045,7 @@
     "print(f\"Amount of B cells with AIR:\\t\\t\\t{len(adata_bcr_tmp)}\")\n",
     "\n",
     "adata_bcr_tmp = adata_bcr_tmp[adata_bcr_tmp.obs[\"chain_pairing\"] != \"multi_chain\"]\n",
-    "print(f\"Amount of B cells without dublets:\\t\\t{len(adata_bcr_tmp)}\")\n",
+    "print(f\"Amount of B cells without doublets:\\t\\t{len(adata_bcr_tmp)}\")\n",
     "\n",
     "adata_bcr_tmp = adata_bcr_tmp[\n",
     "    ~adata_bcr_tmp.obs[\"chain_pairing\"].isin(\n",
@@ -3136,7 +3136,7 @@
      "text": [
       "Amount of all T cells:\t\t\t\t280045\n",
       "Amount of T cells with AIR:\t\t\t280023\n",
-      "Amount of T cells without dublets:\t\t280023\n",
+      "Amount of T cells without doublets:\t\t280023\n",
       "Amount of T cells with unique AIR per cell:\t250160\n",
       "Amount of T cells with sinlge complete AIR:\t196957\n"
      ]
@@ -3148,7 +3148,7 @@
     "print(f\"Amount of T cells with AIR:\\t\\t\\t{len(adata_tcr_tmp)}\")\n",
     "\n",
     "adata_tcr_tmp = adata_tcr_tmp[adata_tcr_tmp.obs[\"chain_pairing\"] != \"multi_chain\"]\n",
-    "print(f\"Amount of T cells without dublets:\\t\\t{len(adata_tcr_tmp)}\")\n",
+    "print(f\"Amount of T cells without doublets:\\t\\t{len(adata_tcr_tmp)}\")\n",
     "\n",
     "adata_tcr_tmp = adata_tcr_tmp[\n",
     "    ~adata_tcr_tmp.obs[\"chain_pairing\"].isin(\n",