Skip to content

Commit 46ef7b1

Browse files
add itertools.islice
1 parent 66b1a3a commit 46ef7b1

File tree

4 files changed

+118
-3
lines changed

4 files changed

+118
-3
lines changed

Chapter2/itertools.ipynb

Lines changed: 46 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -665,6 +665,51 @@
665665
"chars = dropwhile(lambda char: char.islower(), word)\n",
666666
"''.join(chars)"
667667
]
668+
},
669+
{
670+
"cell_type": "markdown",
671+
"id": "a2b81830",
672+
"metadata": {},
673+
"source": [
674+
"### itertools.islice: Efficient Data Processing for Large Data Streams"
675+
]
676+
},
677+
{
678+
"cell_type": "markdown",
679+
"id": "ee33429d",
680+
"metadata": {},
681+
"source": [
682+
"Working with large data streams or files can be challenging due to memory limitations. Index slicing is not feasible for extremely large data sets as it requires the entire list to reside in memory.\n",
683+
"\n",
684+
"```python\n",
685+
"# Load all log entries into memory as a list\n",
686+
"large_log = [log_entry for log_entry in open(\"large_log_file.log\")]\n",
687+
"\n",
688+
"# Iterate over the first 100 entries of the list\n",
689+
"for entry in large_log[:100]:\n",
690+
" process_log_entry(entry)\n",
691+
"```"
692+
]
693+
},
694+
{
695+
"cell_type": "markdown",
696+
"id": "105480cf",
697+
"metadata": {},
698+
"source": [
699+
"itertools.islice() allows you to process only a portion of the data stream at a time, without the need to load the entire dataset. This approach enables efficient data processing.\n",
700+
"\n",
701+
"\n",
702+
"```python\n",
703+
"import itertools\n",
704+
"\n",
705+
"# Lazily read file lines with a generator\n",
706+
"large_log = (log_entry for log_entry in open(\"large_log_file.log\"))\n",
707+
"\n",
708+
"# Get the first 100 entries from the generator\n",
709+
"for entry in itertools.islice(large_log, 100):\n",
710+
" process_log_entry(entry)\n",
711+
"```"
712+
]
668713
}
669714
],
670715
"metadata": {
@@ -684,7 +729,7 @@
684729
"name": "python",
685730
"nbconvert_exporter": "python",
686731
"pygments_lexer": "ipython3",
687-
"version": "3.9.6"
732+
"version": "3.11.6"
688733
},
689734
"toc": {
690735
"base_numbering": 1,

docs/Chapter2/itertools.html

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -519,6 +519,7 @@ <h2> Contents </h2>
519519
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#itertools-groupby-group-elements-in-an-iterable-by-a-key">3.2.5. itertools.groupby: Group Elements in an Iterable by a Key</a></li>
520520
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#itertools-zip-longest-zip-iterables-of-different-lengths">3.2.6. itertools.zip_longest: Zip Iterables of Different Lengths</a></li>
521521
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#itertools-dropwhile-drop-elements-in-an-iterable-until-a-condition-is-false">3.2.7. itertools.dropwhile: Drop Elements in an Iterable Until a Condition Is False</a></li>
522+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#itertools-islice-efficient-data-processing-for-large-data-streams">3.2.8. itertools.islice: Efficient Data Processing for Large Data Streams</a></li>
522523
</ul>
523524
</nav>
524525
</div>
@@ -894,6 +895,29 @@ <h2><span class="section-number">3.2.7. </span>itertools.dropwhile: Drop Element
894895
</div>
895896
</div>
896897
</section>
898+
<section id="itertools-islice-efficient-data-processing-for-large-data-streams">
899+
<h2><span class="section-number">3.2.8. </span>itertools.islice: Efficient Data Processing for Large Data Streams<a class="headerlink" href="#itertools-islice-efficient-data-processing-for-large-data-streams" title="Permalink to this heading">#</a></h2>
900+
<p>Working with large data streams or files can be challenging due to memory limitations. Index slicing is not feasible for extremely large data sets as it requires the entire list to reside in memory.</p>
901+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># Load all log entries into memory as a list</span>
902+
<span class="n">large_log</span> <span class="o">=</span> <span class="p">[</span><span class="n">log_entry</span> <span class="k">for</span> <span class="n">log_entry</span> <span class="ow">in</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;large_log_file.log&quot;</span><span class="p">)]</span>
903+
904+
<span class="c1"># Iterate over the first 100 entries of the list</span>
905+
<span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">large_log</span><span class="p">[:</span><span class="mi">100</span><span class="p">]:</span>
906+
<span class="n">process_log_entry</span><span class="p">(</span><span class="n">entry</span><span class="p">)</span>
907+
</pre></div>
908+
</div>
909+
<p>itertools.islice() allows you to process only a portion of the data stream at a time, without the need to load the entire dataset. This approach enables efficient data processing.</p>
910+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">itertools</span>
911+
912+
<span class="c1"># Lazily read file lines with a generator</span>
913+
<span class="n">large_log</span> <span class="o">=</span> <span class="p">(</span><span class="n">log_entry</span> <span class="k">for</span> <span class="n">log_entry</span> <span class="ow">in</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;large_log_file.log&quot;</span><span class="p">))</span>
914+
915+
<span class="c1"># Get the first 100 entries from the generator</span>
916+
<span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">itertools</span><span class="o">.</span><span class="n">islice</span><span class="p">(</span><span class="n">large_log</span><span class="p">,</span> <span class="mi">100</span><span class="p">):</span>
917+
<span class="n">process_log_entry</span><span class="p">(</span><span class="n">entry</span><span class="p">)</span>
918+
</pre></div>
919+
</div>
920+
</section>
897921
</section>
898922

899923
<script type="text/x-thebe-config">
@@ -966,6 +990,7 @@ <h2><span class="section-number">3.2.7. </span>itertools.dropwhile: Drop Element
966990
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#itertools-groupby-group-elements-in-an-iterable-by-a-key">3.2.5. itertools.groupby: Group Elements in an Iterable by a Key</a></li>
967991
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#itertools-zip-longest-zip-iterables-of-different-lengths">3.2.6. itertools.zip_longest: Zip Iterables of Different Lengths</a></li>
968992
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#itertools-dropwhile-drop-elements-in-an-iterable-until-a-condition-is-false">3.2.7. itertools.dropwhile: Drop Elements in an Iterable Until a Condition Is False</a></li>
993+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#itertools-islice-efficient-data-processing-for-large-data-streams">3.2.8. itertools.islice: Efficient Data Processing for Large Data Streams</a></li>
969994
</ul>
970995
</nav></div>
971996

docs/_sources/Chapter2/itertools.ipynb

Lines changed: 46 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -665,6 +665,51 @@
665665
"chars = dropwhile(lambda char: char.islower(), word)\n",
666666
"''.join(chars)"
667667
]
668+
},
669+
{
670+
"cell_type": "markdown",
671+
"id": "a2b81830",
672+
"metadata": {},
673+
"source": [
674+
"### itertools.islice: Efficient Data Processing for Large Data Streams"
675+
]
676+
},
677+
{
678+
"cell_type": "markdown",
679+
"id": "ee33429d",
680+
"metadata": {},
681+
"source": [
682+
"Working with large data streams or files can be challenging due to memory limitations. Index slicing is not feasible for extremely large data sets as it requires the entire list to reside in memory.\n",
683+
"\n",
684+
"```python\n",
685+
"# Load all log entries into memory as a list\n",
686+
"large_log = [log_entry for log_entry in open(\"large_log_file.log\")]\n",
687+
"\n",
688+
"# Iterate over the first 100 entries of the list\n",
689+
"for entry in large_log[:100]:\n",
690+
" process_log_entry(entry)\n",
691+
"```"
692+
]
693+
},
694+
{
695+
"cell_type": "markdown",
696+
"id": "105480cf",
697+
"metadata": {},
698+
"source": [
699+
"itertools.islice() allows you to process only a portion of the data stream at a time, without the need to load the entire dataset. This approach enables efficient data processing.\n",
700+
"\n",
701+
"\n",
702+
"```python\n",
703+
"import itertools\n",
704+
"\n",
705+
"# Lazily read file lines with a generator\n",
706+
"large_log = (log_entry for log_entry in open(\"large_log_file.log\"))\n",
707+
"\n",
708+
"# Get the first 100 entries from the generator\n",
709+
"for entry in itertools.islice(large_log, 100):\n",
710+
" process_log_entry(entry)\n",
711+
"```"
712+
]
668713
}
669714
],
670715
"metadata": {
@@ -684,7 +729,7 @@
684729
"name": "python",
685730
"nbconvert_exporter": "python",
686731
"pygments_lexer": "ipython3",
687-
"version": "3.9.6"
732+
"version": "3.11.6"
688733
},
689734
"toc": {
690735
"base_numbering": 1,

docs/searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)