Skip to content

Commit 2b6691b

Browse files
committed
1 parent aef1443 commit 2b6691b

File tree

3 files changed

+49
-18
lines changed

3 files changed

+49
-18
lines changed

docs/dev/io.html

Lines changed: 48 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -682,6 +682,20 @@
682682
<section id="input-and-output-skbio-io">
683683
<span id="module-skbio.io"></span><h1>Input and Output (<a class="reference internal" href="#module-skbio.io" title="skbio.io"><code class="xref py py-mod docutils literal notranslate"><span class="pre">skbio.io</span></code></a>)<a class="headerlink" href="#input-and-output-skbio-io" title="Link to this heading">#</a></h1>
684684
<p>This module provides input/output (I/O) functionality for scikit-bio.</p>
685+
<p>In bioinformatics there are many different file formats, and in scikit-bio there are
686+
many different classes which can read and write these formats. The many-to-many
687+
nature of the relationships between scikit-bio objects and file formats inspired
688+
the creation of the scikit-bio <code class="docutils literal notranslate"><span class="pre">io</span></code> module, which manages these relationships
689+
transparently.</p>
690+
<p>For general guidance on reading and writing files and working with scikit-bio objects,
691+
see the <a class="reference internal" href="#tutorial"><span class="std std-ref">Tutorial</span></a> section and the
692+
<a class="reference external" href="https://github.com/scikit-bio/scikit-bio-cookbook/blob/master/Reading%20and%20writing%20files.ipynb">Reading and writing files</a>
693+
notebook. For guidance on a specific format or scikit-bio object,
694+
see the documentation for that format or object.</p>
695+
<p>See the
696+
<a class="reference external" href="../../docs/latest/generated/skbio.io.registry.html#creating-a-new-format-for-scikit-bio">IORegistry docs</a>
697+
for guidance on creating custom formats and registering custom readers, writers, and
698+
sniffers.</p>
685699
<section id="supported-file-formats">
686700
<h2>Supported file formats<a class="headerlink" href="#supported-file-formats" title="Link to this heading">#</a></h2>
687701
<p>scikit-bio provides parsers for the following file formats. For details on what objects
@@ -861,7 +875,7 @@ <h2>Exceptions and warnings<a class="headerlink" href="#exceptions-and-warnings"
861875
</div>
862876
</section>
863877
<section id="tutorial">
864-
<h2>Tutorial<a class="headerlink" href="#tutorial" title="Link to this heading">#</a></h2>
878+
<span id="id1"></span><h2>Tutorial<a class="headerlink" href="#tutorial" title="Link to this heading">#</a></h2>
865879
<p>Reading and writing files (I/O) can be a complicated task:</p>
866880
<ul class="simple">
867881
<li><p>A file format can sometimes be read into more than one in-memory representation
@@ -880,7 +894,7 @@ <h2>Tutorial<a class="headerlink" href="#tutorial" title="Link to this heading">
880894
</ul>
881895
<p>To address these issues (and others), scikit-bio provides a simple, powerful
882896
interface for dealing with I/O. We accomplish this by using a single I/O
883-
registry.</p>
897+
registry defined in <a class="reference internal" href="generated/skbio.io.registry.IORegistry.html#skbio.io.registry.IORegistry" title="skbio.io.registry.IORegistry"><code class="xref py py-class docutils literal notranslate"><span class="pre">skbio.io.registry.IORegistry</span></code></a>.</p>
884898
<section id="what-kinds-of-files-scikit-bio-can-use">
885899
<h3>What kinds of files scikit-bio can use<a class="headerlink" href="#what-kinds-of-files-scikit-bio-can-use" title="Link to this heading">#</a></h3>
886900
<p>To see a complete list of file-like inputs that can be used for reading,
@@ -893,22 +907,34 @@ <h3>Reading files into scikit-bio<a class="headerlink" href="#reading-files-into
893907
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">my_obj</span> <span class="o">=</span> <span class="n">skbio</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s1">&#39;someformat&#39;</span><span class="p">,</span> <span class="n">into</span><span class="o">=</span><span class="n">SomeSkbioClass</span><span class="p">)</span>
894908
</pre></div>
895909
</div>
896-
<p>The second is to use the object-oriented (OO) interface which is automatically
897-
constructed from the procedural interface:</p>
910+
<p>Here, <code class="docutils literal notranslate"><span class="pre">file</span></code> can be a path to a file, a file handle, or any of the other
911+
objects with read support listed in the <a class="reference internal" href="generated/skbio.io.util.open.html#skbio.io.util.open" title="skbio.io.util.open"><code class="xref py py-func docutils literal notranslate"><span class="pre">skbio.io.util.open()</span></code></a> documentation.</p>
912+
<p>The second way to read files is to use the object-oriented interface, which is
913+
automatically constructed from the procedural interface:</p>
898914
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">my_obj</span> <span class="o">=</span> <span class="n">SomeSkbioClass</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s1">&#39;someformat&#39;</span><span class="p">)</span>
899915
</pre></div>
900916
</div>
901-
<p>For example, to read a <code class="docutils literal notranslate"><span class="pre">newick</span></code> file using both interfaces you would type:</p>
902-
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">skbio</span> <span class="kn">import</span> <span class="n">read</span>
917+
<div class="admonition note">
918+
<p class="admonition-title">Note</p>
919+
<p>A very common use case in bioinformatics is to read multi-line FASTA and
920+
FASTQ files. For examples on how to achieve this with scikit-bio, please see the
921+
<a class="reference external" href="../../docs/dev/generated/skbio.io.format.fasta.html#examples">FASTA documentation</a>
922+
or the
923+
<a class="reference external" href="../../docs/dev/generated/skbio.io.format.fastq.html#examples">FASTQ documentation</a>.</p>
924+
</div>
925+
<p>As an example, let’s read a <a class="reference internal" href="generated/skbio.io.format.newick.html#module-skbio.io.format.newick" title="skbio.io.format.newick"><code class="xref py py-mod docutils literal notranslate"><span class="pre">newick</span></code></a> file into a
926+
<a class="reference internal" href="generated/skbio.tree.TreeNode.html#skbio.tree.TreeNode" title="skbio.tree.TreeNode"><code class="xref py py-class docutils literal notranslate"><span class="pre">TreeNode</span></code></a> object using both interfaces. Here we will use Python’s
927+
built-in <a class="reference external" href="https://docs.python.org/3/library/io.html#io.StringIO" title="(in Python v3.13)"><code class="xref py py-class docutils literal notranslate"><span class="pre">StringIO</span></code></a> class to mimick an open file:</p>
928+
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">skbio</span> <span class="kn">import</span> <span class="n">read</span> <span class="k">as</span> <span class="n">sk_read</span>
903929
<span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">skbio</span> <span class="kn">import</span> <span class="n">TreeNode</span>
904930
<span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">io</span> <span class="kn">import</span> <span class="n">StringIO</span>
905931
<span class="gp">&gt;&gt;&gt; </span><span class="n">open_filehandle</span> <span class="o">=</span> <span class="n">StringIO</span><span class="p">(</span><span class="s1">&#39;(a, b);&#39;</span><span class="p">)</span>
906-
<span class="gp">&gt;&gt;&gt; </span><span class="n">tree</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">open_filehandle</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s1">&#39;newick&#39;</span><span class="p">,</span> <span class="n">into</span><span class="o">=</span><span class="n">TreeNode</span><span class="p">)</span>
932+
<span class="gp">&gt;&gt;&gt; </span><span class="n">tree</span> <span class="o">=</span> <span class="n">sk_read</span><span class="p">(</span><span class="n">open_filehandle</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s1">&#39;newick&#39;</span><span class="p">,</span> <span class="n">into</span><span class="o">=</span><span class="n">TreeNode</span><span class="p">)</span>
907933
<span class="gp">&gt;&gt;&gt; </span><span class="n">tree</span>
908934
<span class="go">&lt;TreeNode, name: unnamed, internal node count: 0, tips count: 2&gt;</span>
909935
</pre></div>
910936
</div>
911-
<p>For the OO interface:</p>
937+
<p>Or, using the object-oriented interface:</p>
912938
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">open_filehandle</span> <span class="o">=</span> <span class="n">StringIO</span><span class="p">(</span><span class="s1">&#39;(a, b);&#39;</span><span class="p">)</span>
913939
<span class="gp">&gt;&gt;&gt; </span><span class="n">tree</span> <span class="o">=</span> <span class="n">TreeNode</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">open_filehandle</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s1">&#39;newick&#39;</span><span class="p">)</span>
914940
<span class="gp">&gt;&gt;&gt; </span><span class="n">tree</span>
@@ -919,9 +945,9 @@ <h3>Reading files into scikit-bio<a class="headerlink" href="#reading-files-into
919945
generator will be returned. What the generator yields will depend on what
920946
format is being read.</p>
921947
<p>When <code class="docutils literal notranslate"><span class="pre">into</span></code> is provided, format may be omitted and the registry will use its
922-
knowledge of the available formats for the requested class to infer the correct
923-
format. This format inference is also available in the OO interface, meaning
924-
that <code class="docutils literal notranslate"><span class="pre">format</span></code> may be omitted there as well.</p>
948+
knowledge of the available formats for the requested class to infer (sniff) the
949+
correct format. This format inference is also available in the object-oriented
950+
interface, meaning that <code class="docutils literal notranslate"><span class="pre">format</span></code> may be omitted there as well.</p>
925951
<p>As an example:</p>
926952
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">open_filehandle</span> <span class="o">=</span> <span class="n">StringIO</span><span class="p">(</span><span class="s1">&#39;(a, b);&#39;</span><span class="p">)</span>
927953
<span class="gp">&gt;&gt;&gt; </span><span class="n">tree</span> <span class="o">=</span> <span class="n">TreeNode</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">open_filehandle</span><span class="p">)</span>
@@ -936,7 +962,8 @@ <h3>Reading files into scikit-bio<a class="headerlink" href="#reading-files-into
936962
<div class="admonition note">
937963
<p class="admonition-title">Note</p>
938964
<p>There is a built-in <code class="docutils literal notranslate"><span class="pre">sniffer</span></code> which results in a useful error message
939-
if an empty file is provided as input and the format was omitted.</p>
965+
if an empty file is provided as input and the format was omitted. See the
966+
<a class="reference external" href="../../docs/dev/generated/skbio.io.registry.sniff.html">sniff documentation</a> for more information.</p>
940967
</div>
941968
</section>
942969
<section id="writing-files-from-scikit-bio">
@@ -946,19 +973,23 @@ <h3>Writing files from scikit-bio<a class="headerlink" href="#writing-files-from
946973
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">skbio</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">my_obj</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s1">&#39;someformat&#39;</span><span class="p">,</span> <span class="n">into</span><span class="o">=</span><span class="n">file</span><span class="p">)</span>
947974
</pre></div>
948975
</div>
949-
<p>OO Interface:</p>
976+
<p>Object-oriented Interface:</p>
950977
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">my_obj</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s1">&#39;someformat&#39;</span><span class="p">)</span>
951978
</pre></div>
952979
</div>
953980
<p>In the procedural interface, <code class="docutils literal notranslate"><span class="pre">format</span></code> is required. Without it, scikit-bio does
954-
not know how you want to serialize an object. OO interfaces define a default
955-
<code class="docutils literal notranslate"><span class="pre">format</span></code>, so it may not be necessary to include it.</p>
981+
not know how you want to serialize an object. Object-oriented interfaces define a
982+
default <code class="docutils literal notranslate"><span class="pre">format</span></code>, so it may not be necessary to include it.</p>
983+
<p>For more information on writing to a specific file format, please see that format’s
984+
documentation page.</p>
956985
</section>
957986
<section id="streaming-files-with-read-and-write">
958987
<h3>Streaming files with read and write<a class="headerlink" href="#streaming-files-with-read-and-write" title="Link to this heading">#</a></h3>
959988
<p>If you are working with particularly large files, streaming them might be preferable.
960-
Scikit-bio’s <code class="docutils literal notranslate"><span class="pre">io</span></code> module offers the ability to contruct a streaming interface from
961-
the <code class="docutils literal notranslate"><span class="pre">read</span></code> and <code class="docutils literal notranslate"><span class="pre">write</span></code> functions.</p>
989+
For instance, if your file is larger than your available memory, you won’t be able
990+
to read the entire file into memory at once. One way to get around this is to use
991+
streaming. Scikit-bio’s <code class="docutils literal notranslate"><span class="pre">io</span></code> module offers the ability to contruct a streaming
992+
interface from the <code class="docutils literal notranslate"><span class="pre">read</span></code> and <code class="docutils literal notranslate"><span class="pre">write</span></code> functions.</p>
962993
<p><code class="docutils literal notranslate"><span class="pre">skbio.io.read</span></code> returns a generator, which can then be passed to <code class="docutils literal notranslate"><span class="pre">skbio.io.write</span></code>
963994
to write only one chunk from the generator at a time.</p>
964995
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">seq_gen</span> <span class="o">=</span> <span class="n">skbio</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">big_file</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s1">&#39;someformat&#39;</span><span class="p">)</span>

docs/dev/objects.inv

17 Bytes
Binary file not shown.

docs/dev/searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)