Skip to content

Commit e1a3cc0

Browse files
committed
1 parent c3a4b01 commit e1a3cc0

File tree

3 files changed

+74
-1
lines changed

3 files changed

+74
-1
lines changed

docs/dev/generated/skbio.io.format.fasta.html

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1310,6 +1310,65 @@ <h3>Reading and Writing FASTA/QUAL Files<a class="headerlink" href="#reading-and
13101310
</pre></div>
13111311
</div>
13121312
</section>
1313+
<section id="reading-multi-fasta-files">
1314+
<h3>Reading multi-FASTA Files<a class="headerlink" href="#reading-multi-fasta-files" title="Link to this heading">#</a></h3>
1315+
<p>Suppose you have a multi-FASTA file and want to read each sequence into a <code class="docutils literal notranslate"><span class="pre">DNA</span></code>
1316+
object in a list. We’ll be using <code class="docutils literal notranslate"><span class="pre">io.StringIO</span></code> to make a mock FASTA file in
1317+
memory.</p>
1318+
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">io</span> <span class="kn">import</span> <span class="n">StringIO</span>
1319+
<span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">skbio</span>
1320+
<span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">skbio.sequence</span> <span class="kn">import</span> <span class="n">DNA</span>
1321+
<span class="gp">&gt;&gt;&gt; </span><span class="n">fl</span> <span class="o">=</span> <span class="s2">&quot;&gt;seq1 Turkey</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">+</span>\
1322+
<span class="gp">... </span> <span class="s2">&quot;AAGCTNGGGCATTTCAGGGTGAGCCCGGGCAATACAGGGTAT</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">+</span>\
1323+
<span class="gp">... </span> <span class="s2">&quot;&gt;seq2 Salmo gair</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">+</span>\
1324+
<span class="gp">... </span> <span class="s2">&quot;AAGCCTTGGCAGTGCAGGGTGAGCCGTGG</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">+</span>\
1325+
<span class="gp">... </span> <span class="s2">&quot;CCGGGCACGGTAT</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">+</span>\
1326+
<span class="gp">... </span> <span class="s2">&quot;&gt;seq3 H. Sapiens</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">+</span>\
1327+
<span class="gp">... </span> <span class="s2">&quot;ACCGGTTGGCCGTTCAGGGTACAGGTTGGCCGTTCAGGGTAA</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">+</span>\
1328+
<span class="gp">... </span> <span class="s2">&quot;&gt;seq4 Chimp</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">+</span>\
1329+
<span class="gp">... </span> <span class="s2">&quot;AAACCCTTGCCG</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">+</span>\
1330+
<span class="gp">... </span> <span class="s2">&quot;TTACGCTTAAAC</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">+</span>\
1331+
<span class="gp">... </span> <span class="s2">&quot;CGAGGCCGGGAC</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">+</span>\
1332+
<span class="gp">... </span> <span class="s2">&quot;ACTCAT</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">+</span>\
1333+
<span class="gp">... </span> <span class="s2">&quot;&gt;seq5 Gorilla</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">+</span>\
1334+
<span class="gp">... </span> <span class="s2">&quot;AAACCCTTGCCGGTACGCTTAAACCATTGCCGGTACGCTTAA</span><span class="se">\n</span><span class="s2">&quot;</span>
1335+
<span class="gp">&gt;&gt;&gt; </span><span class="n">mock_fl</span> <span class="o">=</span> <span class="n">StringIO</span><span class="p">(</span><span class="n">fl</span><span class="p">)</span>
1336+
</pre></div>
1337+
</div>
1338+
<p>The following code will read the sequences into scikit-bio. In practice, <code class="docutils literal notranslate"><span class="pre">mock_fl</span></code>
1339+
may be replaced with an opened file handle, or the path to the file.</p>
1340+
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">res</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">skbio</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">mock_fl</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s2">&quot;fasta&quot;</span><span class="p">,</span> <span class="n">constructor</span><span class="o">=</span><span class="n">DNA</span><span class="p">))</span>
1341+
<span class="gp">&gt;&gt;&gt; </span><span class="n">res</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
1342+
<span class="go">DNA</span>
1343+
<span class="go">------------------------------------------------</span>
1344+
<span class="go">Metadata:</span>
1345+
<span class="go"> &#39;description&#39;: &#39;Turkey&#39;</span>
1346+
<span class="go"> &#39;id&#39;: &#39;seq1&#39;</span>
1347+
<span class="go">Stats:</span>
1348+
<span class="go"> length: 42</span>
1349+
<span class="go"> has gaps: False</span>
1350+
<span class="go"> has degenerates: True</span>
1351+
<span class="go"> has definites: True</span>
1352+
<span class="go"> GC-content: 54.76%</span>
1353+
<span class="go">------------------------------------------------</span>
1354+
<span class="go">0 AAGCTNGGGC ATTTCAGGGT GAGCCCGGGC AATACAGGGT AT</span>
1355+
<span class="gp">&gt;&gt;&gt; </span><span class="n">res</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
1356+
<span class="go">DNA</span>
1357+
<span class="go">------------------------------------------------</span>
1358+
<span class="go">Metadata:</span>
1359+
<span class="go"> &#39;description&#39;: &#39;Salmo gair&#39;</span>
1360+
<span class="go"> &#39;id&#39;: &#39;seq2&#39;</span>
1361+
<span class="go">Stats:</span>
1362+
<span class="go"> length: 42</span>
1363+
<span class="go"> has gaps: False</span>
1364+
<span class="go"> has degenerates: False</span>
1365+
<span class="go"> has definites: True</span>
1366+
<span class="go"> GC-content: 66.67%</span>
1367+
<span class="go">------------------------------------------------</span>
1368+
<span class="go">0 AAGCCTTGGC AGTGCAGGGT GAGCCGTGGC CGGGCACGGT AT</span>
1369+
</pre></div>
1370+
</div>
1371+
</section>
13131372
</section>
13141373
<section id="references">
13151374
<h2>References<a class="headerlink" href="#references" title="Link to this heading">#</a></h2>
@@ -1418,6 +1477,7 @@ <h2>References<a class="headerlink" href="#references" title="Link to this headi
14181477
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#examples">Examples</a><ul class="nav section-nav flex-column">
14191478
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#reading-and-writing-fasta-files">Reading and Writing FASTA Files</a></li>
14201479
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#reading-and-writing-fasta-qual-files">Reading and Writing FASTA/QUAL Files</a></li>
1480+
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#reading-multi-fasta-files">Reading multi-FASTA Files</a></li>
14211481
</ul>
14221482
</li>
14231483
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#references">References</a></li>

docs/dev/io.html

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -983,6 +983,18 @@ <h3>Writing files from scikit-bio<a class="headerlink" href="#writing-files-from
983983
not know how you want to serialize an object. OO interfaces define a default
984984
<code class="docutils literal notranslate"><span class="pre">format</span></code>, so it may not be necessary to include it.</p>
985985
</section>
986+
<section id="streaming-files-with-read-and-write">
987+
<h3>Streaming files with read and write<a class="headerlink" href="#streaming-files-with-read-and-write" title="Link to this heading">#</a></h3>
988+
<p>If you are working with particularly large files, streaming them might be preferable.
989+
Scikit-bio’s <code class="docutils literal notranslate"><span class="pre">io</span></code> module offers the ability to contruct a streaming interface from
990+
the <code class="docutils literal notranslate"><span class="pre">read</span></code> and <code class="docutils literal notranslate"><span class="pre">write</span></code> functions.</p>
991+
<p><code class="docutils literal notranslate"><span class="pre">skbio.io.read</span></code> returns a generator, which can then be passed to <code class="docutils literal notranslate"><span class="pre">skbio.io.write</span></code>
992+
to write only one chunk from the generator at a time.</p>
993+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">seq_gen</span> <span class="o">=</span> <span class="n">skbio</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">big_file</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s1">&#39;someformat&#39;</span><span class="p">)</span>
994+
<span class="n">skbio</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">seq_gen</span><span class="p">,</span> <span class="n">into</span><span class="o">=</span><span class="n">write_file</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s1">&#39;someformat&#39;</span><span class="p">)</span>
995+
</pre></div>
996+
</div>
997+
</section>
986998
</section>
987999
</section>
9881000

@@ -1042,6 +1054,7 @@ <h3>Writing files from scikit-bio<a class="headerlink" href="#writing-files-from
10421054
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#what-kinds-of-files-scikit-bio-can-use">What kinds of files scikit-bio can use</a></li>
10431055
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#reading-files-into-scikit-bio">Reading files into scikit-bio</a></li>
10441056
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#writing-files-from-scikit-bio">Writing files from scikit-bio</a></li>
1057+
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#streaming-files-with-read-and-write">Streaming files with read and write</a></li>
10451058
</ul>
10461059
</li>
10471060
</ul>

docs/dev/searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)