add itertools.islice

khuyentran1401 · khuyentran1401 · commit 46ef7b122693 · 2024-04-21T16:34:43.000-05:00
diff --git a/Chapter2/itertools.ipynb b/Chapter2/itertools.ipynb
@@ -665,6 +665,51 @@
     "chars  = dropwhile(lambda char: char.islower(), word)\n",
     "''.join(chars)"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2b81830",
+   "metadata": {},
+   "source": [
+    "### itertools.islice: Efficient Data Processing for Large Data Streams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee33429d",
+   "metadata": {},
+   "source": [
+    "Working with large data streams or files can be challenging due to memory limitations. Index slicing is not feasible for extremely large data sets as it requires the entire list to reside in memory.\n",
+    "\n",
+    "```python\n",
+    "# Load all log entries into memory as a list\n",
+    "large_log = [log_entry for log_entry in open(\"large_log_file.log\")]\n",
+    "\n",
+    "# Iterate over the first 100 entries of the list\n",
+    "for entry in large_log[:100]:\n",
+    "    process_log_entry(entry)\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "105480cf",
+   "metadata": {},
+   "source": [
+    "itertools.islice() allows you to process only a portion of the data stream at a time, without the need to load the entire dataset. This approach enables efficient data processing.\n",
+    "\n",
+    "\n",
+    "```python\n",
+    "import itertools\n",
+    "\n",
+    "# Lazily read file lines with a generator\n",
+    "large_log = (log_entry for log_entry in open(\"large_log_file.log\"))\n",
+    "\n",
+    "# Get the first 100 entries from the generator\n",
+    "for entry in itertools.islice(large_log, 100):\n",
+    "    process_log_entry(entry)\n",
+    "```"
+   ]
   }
  ],
  "metadata": {
@@ -684,7 +729,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.6"
+   "version": "3.11.6"
   },
   "toc": {
    "base_numbering": 1,
diff --git a/docs/Chapter2/itertools.html b/docs/Chapter2/itertools.html
@@ -519,6 +519,7 @@ <h2> Contents </h2>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#itertools-groupby-group-elements-in-an-iterable-by-a-key">3.2.5. itertools.groupby: Group Elements in an Iterable by a Key</a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#itertools-zip-longest-zip-iterables-of-different-lengths">3.2.6. itertools.zip_longest: Zip Iterables of Different Lengths</a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#itertools-dropwhile-drop-elements-in-an-iterable-until-a-condition-is-false">3.2.7. itertools.dropwhile: Drop Elements in an Iterable Until a Condition Is False</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#itertools-islice-efficient-data-processing-for-large-data-streams">3.2.8. itertools.islice: Efficient Data Processing for Large Data Streams</a></li>
 </ul>
             </nav>
         </div>
@@ -894,6 +895,29 @@ <h2><span class="section-number">3.2.7. </span>itertools.dropwhile: Drop Element
 </div>
 </div>
 </section>
+<section id="itertools-islice-efficient-data-processing-for-large-data-streams">
+<h2><span class="section-number">3.2.8. </span>itertools.islice: Efficient Data Processing for Large Data Streams<a class="headerlink" href="#itertools-islice-efficient-data-processing-for-large-data-streams" title="Permalink to this heading">#</a></h2>
+<p>Working with large data streams or files can be challenging due to memory limitations. Index slicing is not feasible for extremely large data sets as it requires the entire list to reside in memory.</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># Load all log entries into memory as a list</span>
+<span class="n">large_log</span> <span class="o">=</span> <span class="p">[</span><span class="n">log_entry</span> <span class="k">for</span> <span class="n">log_entry</span> <span class="ow">in</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;large_log_file.log&quot;</span><span class="p">)]</span>
+
+<span class="c1"># Iterate over the first 100 entries of the list</span>
+<span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">large_log</span><span class="p">[:</span><span class="mi">100</span><span class="p">]:</span>
+    <span class="n">process_log_entry</span><span class="p">(</span><span class="n">entry</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>itertools.islice() allows you to process only a portion of the data stream at a time, without the need to load the entire dataset. This approach enables efficient data processing.</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">itertools</span>
+
+<span class="c1"># Lazily read file lines with a generator</span>
+<span class="n">large_log</span> <span class="o">=</span> <span class="p">(</span><span class="n">log_entry</span> <span class="k">for</span> <span class="n">log_entry</span> <span class="ow">in</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;large_log_file.log&quot;</span><span class="p">))</span>
+
+<span class="c1"># Get the first 100 entries from the generator</span>
+<span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">itertools</span><span class="o">.</span><span class="n">islice</span><span class="p">(</span><span class="n">large_log</span><span class="p">,</span> <span class="mi">100</span><span class="p">):</span>
+    <span class="n">process_log_entry</span><span class="p">(</span><span class="n">entry</span><span class="p">)</span>
+</pre></div>
+</div>
+</section>
 </section>
 
     <script type="text/x-thebe-config">
@@ -966,6 +990,7 @@ <h2><span class="section-number">3.2.7. </span>itertools.dropwhile: Drop Element
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#itertools-groupby-group-elements-in-an-iterable-by-a-key">3.2.5. itertools.groupby: Group Elements in an Iterable by a Key</a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#itertools-zip-longest-zip-iterables-of-different-lengths">3.2.6. itertools.zip_longest: Zip Iterables of Different Lengths</a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#itertools-dropwhile-drop-elements-in-an-iterable-until-a-condition-is-false">3.2.7. itertools.dropwhile: Drop Elements in an Iterable Until a Condition Is False</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#itertools-islice-efficient-data-processing-for-large-data-streams">3.2.8. itertools.islice: Efficient Data Processing for Large Data Streams</a></li>
 </ul>
   </nav></div>
 
diff --git a/docs/_sources/Chapter2/itertools.ipynb b/docs/_sources/Chapter2/itertools.ipynb
@@ -665,6 +665,51 @@
     "chars  = dropwhile(lambda char: char.islower(), word)\n",
     "''.join(chars)"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2b81830",
+   "metadata": {},
+   "source": [
+    "### itertools.islice: Efficient Data Processing for Large Data Streams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee33429d",
+   "metadata": {},
+   "source": [
+    "Working with large data streams or files can be challenging due to memory limitations. Index slicing is not feasible for extremely large data sets as it requires the entire list to reside in memory.\n",
+    "\n",
+    "```python\n",
+    "# Load all log entries into memory as a list\n",
+    "large_log = [log_entry for log_entry in open(\"large_log_file.log\")]\n",
+    "\n",
+    "# Iterate over the first 100 entries of the list\n",
+    "for entry in large_log[:100]:\n",
+    "    process_log_entry(entry)\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "105480cf",
+   "metadata": {},
+   "source": [
+    "itertools.islice() allows you to process only a portion of the data stream at a time, without the need to load the entire dataset. This approach enables efficient data processing.\n",
+    "\n",
+    "\n",
+    "```python\n",
+    "import itertools\n",
+    "\n",
+    "# Lazily read file lines with a generator\n",
+    "large_log = (log_entry for log_entry in open(\"large_log_file.log\"))\n",
+    "\n",
+    "# Get the first 100 entries from the generator\n",
+    "for entry in itertools.islice(large_log, 100):\n",
+    "    process_log_entry(entry)\n",
+    "```"
+   ]
   }
  ],
  "metadata": {
@@ -684,7 +729,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.6"
+   "version": "3.11.6"
   },
   "toc": {
    "base_numbering": 1,
diff --git a/docs/searchindex.js b/docs/searchindex.js