CodeCutTech
diff --git a/‎Chapter6/alternative_approach.ipynb
Lines changed: 1 addition & 1 deletion b/‎Chapter6/alternative_approach.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎Chapter6/better_outputs.ipynb
Lines changed: 233 additions & 0 deletions b/‎Chapter6/better_outputs.ipynb
Lines changed: 233 additions & 0 deletions
diff --git a/‎Chapter6/foo.pdf
82.2 KB b/‎Chapter6/foo.pdf
82.2 KB
diff --git a/‎Chapter6/workflow_automation.ipynb
Lines changed: 1 addition & 1 deletion b/‎Chapter6/workflow_automation.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/Chapter6/alternative_approach.html
Lines changed: 1 addition & 0 deletions b/‎docs/Chapter6/alternative_approach.html
Lines changed: 1 addition & 0 deletions
@@ -1578,7 +1578,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.6"
+   "version": "3.11.4"
   },
   "toc": {
    "base_numbering": 1,
 
@@ -991,6 +991,239 @@
    "source": [
     "[Link to Great Tables](https://bit.ly/3U58fvP)."
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01df33c0",
+   "metadata": {},
+   "source": [
+    "### Camelot: PDF Table Extraction for Humans"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "90e6b894",
+   "metadata": {
+    "tags": [
+     "hide-cell"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "!pip install \"camelot-py[base]\" \"opencv-python\" \"pypdf2<3\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec590463",
+   "metadata": {},
+   "source": [
+    "With Camelot, you can extract tables from PDFs using Python and convert the data into a more structured format, such as a pandas DataFrame or a CSV file for efficient analysis, manipulation, and integration."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cce6ec0b",
+   "metadata": {},
+   "source": [
+    "To see how Camelot works, start with reading the PDF file named 'foo.pdf' and extracts all the tables present in the file."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "60cbbdf3",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<TableList n=1>"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import camelot\n",
+    "tables = camelot.read_pdf('foo.pdf')\n",
+    "tables"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5125a51",
+   "metadata": {},
+   "source": [
+    "The output shows that there is one table extracted from the PDF file."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "26edcb5c",
+   "metadata": {},
+   "source": [
+    "Export the extracted tables to a CSV file named 'foo.csv'. Camelot also supports exporting tables to other formats like JSON, Excel, HTML, Markdown, and SQLite databases."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "611f33ff",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>0</th>\n",
+       "      <th>1</th>\n",
+       "      <th>2</th>\n",
+       "      <th>3</th>\n",
+       "      <th>4</th>\n",
+       "      <th>5</th>\n",
+       "      <th>6</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Cycle \\nName</td>\n",
+       "      <td>KI \\n(1/km)</td>\n",
+       "      <td>Distance \\n(mi)</td>\n",
+       "      <td>Percent Fuel Savings</td>\n",
+       "      <td></td>\n",
+       "      <td></td>\n",
+       "      <td></td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td></td>\n",
+       "      <td></td>\n",
+       "      <td></td>\n",
+       "      <td>Improved \\nSpeed</td>\n",
+       "      <td>Decreased \\nAccel</td>\n",
+       "      <td>Eliminate \\nStops</td>\n",
+       "      <td>Decreased \\nIdle</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>2012_2</td>\n",
+       "      <td>3.30</td>\n",
+       "      <td>1.3</td>\n",
+       "      <td>5.9%</td>\n",
+       "      <td>9.5%</td>\n",
+       "      <td>29.2%</td>\n",
+       "      <td>17.4%</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>2145_1</td>\n",
+       "      <td>0.68</td>\n",
+       "      <td>11.2</td>\n",
+       "      <td>2.4%</td>\n",
+       "      <td>0.1%</td>\n",
+       "      <td>9.5%</td>\n",
+       "      <td>2.7%</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>4234_1</td>\n",
+       "      <td>0.59</td>\n",
+       "      <td>58.7</td>\n",
+       "      <td>8.5%</td>\n",
+       "      <td>1.3%</td>\n",
+       "      <td>8.5%</td>\n",
+       "      <td>3.3%</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>2032_2</td>\n",
+       "      <td>0.17</td>\n",
+       "      <td>57.8</td>\n",
+       "      <td>21.7%</td>\n",
+       "      <td>0.3%</td>\n",
+       "      <td>2.7%</td>\n",
+       "      <td>1.2%</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>4171_1</td>\n",
+       "      <td>0.07</td>\n",
+       "      <td>173.9</td>\n",
+       "      <td>58.1%</td>\n",
+       "      <td>1.6%</td>\n",
+       "      <td>2.1%</td>\n",
+       "      <td>0.5%</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "              0            1                2                     3  \\\n",
+       "0  Cycle \\nName  KI \\n(1/km)  Distance \\n(mi)  Percent Fuel Savings   \n",
+       "1                                                  Improved \\nSpeed   \n",
+       "2        2012_2         3.30              1.3                  5.9%   \n",
+       "3        2145_1         0.68             11.2                  2.4%   \n",
+       "4        4234_1         0.59             58.7                  8.5%   \n",
+       "5        2032_2         0.17             57.8                 21.7%   \n",
+       "6        4171_1         0.07            173.9                 58.1%   \n",
+       "\n",
+       "                   4                  5                 6  \n",
+       "0                                                          \n",
+       "1  Decreased \\nAccel  Eliminate \\nStops  Decreased \\nIdle  \n",
+       "2               9.5%              29.2%             17.4%  \n",
+       "3               0.1%               9.5%              2.7%  \n",
+       "4               1.3%               8.5%              3.3%  \n",
+       "5               0.3%               2.7%              1.2%  \n",
+       "6               1.6%               2.1%              0.5%  "
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "tables[0].parsing_report\n",
+    "{\n",
+    "    'accuracy': 99.02,\n",
+    "    'whitespace': 12.24,\n",
+    "    'order': 1,\n",
+    "    'page': 1\n",
+    "}\n",
+    "tables[0].to_csv('foo.csv') # to_json, to_excel, to_html, to_markdown, to_sqlite\n",
+    "tables[0].df # get a pandas DataFrame!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c914e50",
+   "metadata": {},
+   "source": [
+    "[Link to Camelot](https://bit.ly/3xPBw6L)."
+   ]
   }
  ],
  "metadata": {
 
@@ -1925,7 +1925,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.6"
+   "version": "3.11.4"
   },
   "toc": {
    "base_numbering": 1,
 
@@ -234,6 +234,7 @@
 <li class="toctree-l2"><a class="reference internal" href="../Chapter2/dataclasses.html">3.7. Data Classes</a></li>
 <li class="toctree-l2"><a class="reference internal" href="../Chapter2/typing.html">3.8. Typing</a></li>
 <li class="toctree-l2"><a class="reference internal" href="../Chapter2/pathlib.html">3.9. pathlib</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../Chapter2/pydantic.html">3.10. Pydantic</a></li>
 </ul>
 </li>
 <li class="toctree-l1 has-children"><a class="reference internal" href="../Chapter3/Chapter3.html">4. Pandas</a><input class="toctree-checkbox" id="toctree-checkbox-4" name="toctree-checkbox-4" type="checkbox"/><label class="toctree-toggle" for="toctree-checkbox-4"><i class="fa-solid fa-chevron-down"></i></label><ul>