CodeCutTech
diff --git a/‎Chapter5/feature_engineer.ipynb
Lines changed: 81 additions & 237 deletions b/‎Chapter5/feature_engineer.ipynb
Lines changed: 81 additions & 237 deletions
diff --git a/‎Chapter5/machine_learning.ipynb
Lines changed: 9 additions & 162 deletions b/‎Chapter5/machine_learning.ipynb
Lines changed: 9 additions & 162 deletions
diff --git a/‎Chapter5/time_series.ipynb
Lines changed: 46 additions & 29 deletions b/‎Chapter5/time_series.ipynb
Lines changed: 46 additions & 29 deletions
diff --git a/‎Chapter6/better_outputs.ipynb
Lines changed: 169 additions & 1 deletion b/‎Chapter6/better_outputs.ipynb
Lines changed: 169 additions & 1 deletion
diff --git a/‎docs/Chapter5/feature_engineer.html
Lines changed: 45 additions & 3 deletions b/‎docs/Chapter5/feature_engineer.html
Lines changed: 45 additions & 3 deletions
@@ -464,37 +464,45 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 17,
-   "id": "9563e0f4",
+   "execution_count": 15,
+   "id": "f7bafb95",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Toronto Time: 2024-03-26 10:19:01.843320-04:00\n",
-      "Paris Time: 2024-03-26 15:19:01.843320+01:00\n",
-      "Future datetime (after adding two days): 2024-03-28 10:19:01.843320-04:00\n"
+      "Current time: 2024-09-02 19:18:13.589774\n",
+      "7 days from now: 2024-09-09 19:18:13.589774\n",
+      "Time in Tokyo: 2024-09-03 09:18:13.590063+09:00\n",
+      "Difference: -477 days, 19:11:46.410226\n"
      ]
     }
    ],
    "source": [
     "from datetime import datetime, timedelta\n",
     "import pytz\n",
     "\n",
-    "# Get current time in Paris\n",
-    "paris_time = datetime.now(pytz.timezone(\"Europe/Paris\"))\n",
+    "# Creating a datetime\n",
+    "now = datetime.now()\n",
+    "print(f\"Current time: {now}\")\n",
+    "\n",
+    "# Date arithmetic\n",
+    "future = now + timedelta(days=7)\n",
+    "print(f\"7 days from now: {future}\")\n",
     "\n",
-    "# Convert Paris time to Toronto time\n",
-    "toronto_timezone = pytz.timezone(\"America/Toronto\")\n",
-    "toronto_time = paris_time.astimezone(toronto_timezone)\n",
+    "# Timezone handling\n",
+    "utc_now = datetime.now(pytz.UTC)\n",
+    "tokyo_tz = pytz.timezone('Asia/Tokyo')\n",
+    "tokyo_time = utc_now.astimezone(tokyo_tz)\n",
+    "print(f\"Time in Tokyo: {tokyo_time}\")\n",
     "\n",
-    "# Add two days\n",
-    "future_datetime = toronto_time + timedelta(days=2)\n",
+    "# Parsing (requires exact format)\n",
+    "parsed = datetime.strptime(\"2023-05-15 14:30:00\", \"%Y-%m-%d %H:%M:%S\")\n",
     "\n",
-    "print(\"Toronto Time:\", toronto_time)\n",
-    "print(\"Paris Time:\", paris_time)\n",
-    "print(\"Future datetime (after adding two days):\", future_datetime)"
+    "# Time difference (not human-readable)\n",
+    "diff = parsed - now \n",
+    "print(f\"Difference: {diff}\")"
    ]
   },
   {
@@ -507,35 +515,44 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 18,
-   "id": "9158e7f6",
+   "execution_count": 4,
+   "id": "7ea99b14",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Toronto Time: 2024-03-26 10:19:03.398059-04:00\n",
-      "Paris Time: 2024-03-26 15:19:03.398059+01:00\n",
-      "Future datetime (after adding two days): 2024-03-28 10:19:03.398059-04:00\n"
+      "Current time: 2024-09-02 18:58:20.467988-05:00\n",
+      "7 days from now: 2024-09-09 18:58:20.467988-05:00\n",
+      "Time in Tokyo: 2024-09-03 08:58:20.467988+09:00\n",
+      "Parsed date: 2023-05-15 14:30:00+00:00\n",
+      "Difference: -1 year -3 months -2 weeks -4 days -9 hours -28 minutes -20 seconds\n"
      ]
     }
    ],
    "source": [
     "import pendulum\n",
     "\n",
-    "# Get current time in Paris\n",
-    "paris_time = pendulum.now(\"Europe/Paris\")\n",
+    "# Creating a datetime\n",
+    "now = pendulum.now()\n",
+    "print(f\"Current time: {now}\")\n",
+    "\n",
+    "# Date arithmetic (more intuitive than datetime)\n",
+    "future = now.add(days=7)\n",
+    "print(f\"7 days from now: {future}\")\n",
     "\n",
-    "# Convert Paris time to Toronto time\n",
-    "toronto_time = paris_time.in_timezone(\"America/Toronto\")\n",
+    "# Timezone handling\n",
+    "tokyo_time = now.in_timezone(\"Asia/Tokyo\")\n",
+    "print(f\"Time in Tokyo: {tokyo_time}\")\n",
     "\n",
-    "# Add two days\n",
-    "future_datetime = toronto_time.add(days=2)\n",
+    "# Parsing without specifying format\n",
+    "parsed = pendulum.parse(\"2023-05-15 14:30:00\")\n",
+    "print(f\"Parsed date: {parsed}\")\n",
     "\n",
-    "print(\"Toronto Time:\", toronto_time)\n",
-    "print(\"Paris Time:\", paris_time)\n",
-    "print(\"Future datetime (after adding two days):\", future_datetime)"
+    "# Human-readable differences\n",
+    "diff = parsed - now\n",
+    "print(f\"Difference: {diff.in_words()}\")"
    ]
   },
   {
 
@@ -741,6 +741,174 @@
     "[Link to latexify_py](https://github.com/google/latexify_py)."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "4c8c79c1-23df-4b11-ae56-d9050e03317d",
+   "metadata": {},
+   "source": [
+    "### From Python to Paper: Visualizing Calculations with Handcalcs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aaab648d-549f-404c-84d9-73139fa1495e",
+   "metadata": {
+    "tags": [
+     "hide-cell"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "!pip install handcalcs"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c1b38f2-edb2-4a36-9f68-7e62a8f3d4b9",
+   "metadata": {},
+   "source": [
+    "Python calculations often lack transparency when only showing final results. \n",
+    "\n",
+    "Handcalcs addresses this by generating LaTeX output that mimics handwritten calculations. It displays symbolic formulas, numeric substitutions, and results, providing a clear step-by-step breakdown. \n",
+    "\n",
+    "This approach makes calculations more intuitive, readable, and easier to verify manually.\n",
+    "\n",
+    "Handcalcs can be used in two main ways:\n",
+    "\n",
+    "1. As a cell magic in Jupyter notebooks using `%%render`:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "dbaac238-05b1-4a8a-8cb8-658f7e39bcff",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import handcalcs.render\n",
+    "from handcalcs.decorator import handcalc"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "f3521696-92ba-437e-a552-e090698c676c",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/latex": [
+       "\\[\n",
+       "\\begin{aligned}\n",
+       "a &= 2 \\; \n",
+       "\\\\[10pt]\n",
+       "b &= 3 \\; \n",
+       "\\\\[10pt]\n",
+       "c &= 2 \\cdot a + \\frac{ b }{ 3 }  = 2 \\cdot 2 + \\frac{ 3 }{ 3 } &= 5.000  \n",
+       "\\end{aligned}\n",
+       "\\]"
+      ],
+      "text/plain": [
+       "<IPython.core.display.Latex object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%render\n",
+    "a = 2\n",
+    "b = 3\n",
+    "c = 2*a + b/3"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdb7d726-a5d6-47e7-a05e-3559b4265f0c",
+   "metadata": {},
+   "source": [
+    "2. As a decorator for functions:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "f605eb86-1eb0-4579-8a57-4a8ff50406af",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from math import sqrt\n",
+    "\n",
+    "@handcalc(jupyter_display=True)\n",
+    "def my_calc(x: float, y: float, z: float):\n",
+    "    a = 2*x\n",
+    "    b = 3*a/z + sqrt(a+y/2)\n",
+    "    c = a + b\n",
+    "    return c"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "5e0f4471-b2b3-44a0-a462-52dc2a58a1bd",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/latex": [
+       "\\[\n",
+       "\\begin{aligned}\n",
+       "a &= 2 \\cdot x  = 2 \\cdot 2.300 &= 4.600  \n",
+       "\\\\[10pt]\n",
+       "b &= 3 \\cdot \\frac{ a }{ z } + \\sqrt { a + \\frac{ y }{ 2 } }  = 3 \\cdot \\frac{ 4.600 }{ 1.200 } + \\sqrt { 4.600 + \\frac{ 3.200 }{ 2 } } &= 13.990  \n",
+       "\\\\[10pt]\n",
+       "c &= a + b  = 4.600 + 13.990 &= 18.590  \n",
+       "\\end{aligned}\n",
+       "\\]"
+      ],
+      "text/plain": [
+       "<IPython.core.display.Latex object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "result = my_calc(2.3, 3.2, 1.2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "abca39c7-c8b9-4ef2-b56d-115638cc9316",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "18.589979919597745"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "result"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae2c0e92-292d-426f-9cc0-b90aed8ee193",
+   "metadata": {},
+   "source": [
+    "[Link to handcalcs](https://github.com/connorferster/handcalcs)."
+   ]
+  },
   {
    "attachments": {},
    "cell_type": "markdown",
@@ -1475,7 +1643,7 @@
   "celltoolbar": "Tags",
   "hide_input": false,
   "kernelspec": {
-   "display_name": "venv",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
 
@@ -705,6 +705,48 @@ <h2><span class="section-number">6.2.2. </span>Strategy to Prevent Data Leakage
 </div>
 </div>
 </div>
+<p>Time series data is unique because it has a temporal order. This means that data from the future shouldn’t influence predictions about the past. However, standard cross-validation techniques like K-Fold randomly shuffle the data, potentially using future information to predict past events.</p>
+<p>scikit-learn provides us with a powerful tool designed specifically for time series data: TimeSeriesSplit. This clever cross-validator respects the temporal order of our data, ensuring that we always train on past data and test on future data.</p>
+<p>Let’s explore how to use TimeSeriesSplit with a simple example:</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
+<span class="kn">from</span> <span class="nn">sklearn.model_selection</span> <span class="kn">import</span> <span class="n">TimeSeriesSplit</span>
+
+<span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">]])</span>
+<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">])</span>
+
+<span class="n">tscv</span> <span class="o">=</span> <span class="n">TimeSeriesSplit</span><span class="p">(</span><span class="n">n_splits</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
+
+<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">train_index</span><span class="p">,</span> <span class="n">test_index</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">tscv</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">X</span><span class="p">)):</span>
+    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Fold </span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s2">:&quot;</span><span class="p">)</span>
+    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;  Train: index=</span><span class="si">{</span><span class="n">train_index</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
+    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;  Test:  index=</span><span class="si">{</span><span class="n">test_index</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+<div class="cell_output docutils container">
+<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>Fold 0:
+  Train: index=[0 1 2]
+  Test:  index=[3]
+Fold 1:
+  Train: index=[0 1 2 3]
+  Test:  index=[4]
+Fold 2:
+  Train: index=[0 1 2 3 4]
+  Test:  index=[5]
+</pre></div>
+</div>
+</div>
+</div>
+<p>From the outputs, we can see that:</p>
+<ol class="arabic simple">
+<li><p>Temporal Integrity: The split always respects the original order of the data.</p></li>
+<li><p>Growing Training Set: With each fold, the training set expands to include more historical data.</p></li>
+<li><p>Forward-Moving Test Set: The test set is always a single future sample, progressing with each fold.</p></li>
+<li><p>No Data Leakage: Future information is never used to predict past events.</p></li>
+</ol>
+<p>This approach mimics real-world forecasting scenarios, where models use historical data to predict future outcomes.</p>
 </section>
 <section id="enhancing-data-handling-with-scikit-learn-s-dataframe-support">
 <h2><span class="section-number">6.2.3. </span>Enhancing Data Handling with scikit-learn’s DataFrame Support<a class="headerlink" href="#enhancing-data-handling-with-scikit-learn-s-dataframe-support" title="Permalink to this heading">#</a></h2>
@@ -3763,16 +3805,16 @@ <h2><span class="section-number">6.2.14. </span>sketch: AI Code-Writing Assistan
         },
         codeMirrorConfig: {
             theme: "abcdef",
-            mode: "python"
+            mode: "data-science"
         },
         kernelOptions: {
-            name: "python3",
+            name: "data-science",
             path: "./Chapter5"
         },
         predefinedOutput: true
     }
     </script>
-    <script>kernelName = 'python3'</script>
+    <script>kernelName = 'data-science'</script>
 
                 </article>