Skip to content

Commit 8298c12

Browse files
add duckdb csv
1 parent c6e7d3c commit 8298c12

19 files changed

+1677
-96
lines changed

Chapter5/SQL.ipynb

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -828,6 +828,132 @@
828828
"[Link to DuckDB](https://github.com/duckdb/duckdb)."
829829
]
830830
},
831+
{
832+
"cell_type": "markdown",
833+
"id": "300d0507",
834+
"metadata": {},
835+
"source": [
836+
"### Simplify CSV Data Management with DuckDB"
837+
]
838+
},
839+
{
840+
"cell_type": "code",
841+
"execution_count": null,
842+
"id": "d00f897e",
843+
"metadata": {
844+
"tags": [
845+
"hide-cell"
846+
]
847+
},
848+
"outputs": [],
849+
"source": [
850+
"!pip install duckdb "
851+
]
852+
},
853+
{
854+
"cell_type": "code",
855+
"execution_count": 1,
856+
"id": "1eeb22f1",
857+
"metadata": {
858+
"tags": [
859+
"hide-cell"
860+
]
861+
},
862+
"outputs": [],
863+
"source": [
864+
"import pandas as pd\n",
865+
"\n",
866+
"# Create a sample dataframe\n",
867+
"data = {\n",
868+
" \"name\": [\"Alice\", \"Bob\", \"Charlie\", \"David\", \"Eve\"],\n",
869+
" \"age\": [25, 32, 45, 19, 38],\n",
870+
" \"city\": [\"New York\", \"London\", \"Paris\", \"Berlin\", \"Tokyo\"],\n",
871+
"}\n",
872+
"\n",
873+
"df = pd.DataFrame(data)\n",
874+
"\n",
875+
"# Save the dataframe as a CSV file\n",
876+
"df.to_csv(\"customers.csv\", index=False)"
877+
]
878+
},
879+
{
880+
"cell_type": "markdown",
881+
"id": "c30018df",
882+
"metadata": {},
883+
"source": [
884+
"Traditional database systems, such as Postgres, require a predefined table schema and a subsequent data import process when working with CSV data. \n",
885+
"\n"
886+
]
887+
},
888+
{
889+
"cell_type": "markdown",
890+
"id": "7de593b3",
891+
"metadata": {},
892+
"source": [
893+
"```sql\n",
894+
"CREATE TABLE customers (\n",
895+
" id SERIAL PRIMARY KEY,\n",
896+
" name TEXT,\n",
897+
" age INTEGER\n",
898+
");\n",
899+
"\n",
900+
"COPY customers(name, age)\n",
901+
"FROM 'customers.csv'\n",
902+
"DELIMITER ','\n",
903+
"CSV HEADER;\n",
904+
"\n",
905+
"SELECT * FROM customers;\n",
906+
"```"
907+
]
908+
},
909+
{
910+
"cell_type": "markdown",
911+
"id": "33bf4b63",
912+
"metadata": {},
913+
"source": [
914+
"In contrast, DuckDB allows for direct reading of CSV files from disk, eliminating the need for explicit table creation and data loading."
915+
]
916+
},
917+
{
918+
"cell_type": "code",
919+
"execution_count": 4,
920+
"id": "c5970f01",
921+
"metadata": {},
922+
"outputs": [
923+
{
924+
"data": {
925+
"text/plain": [
926+
"┌─────────┬───────┬──────────┐\n",
927+
"│ name │ age │ city │\n",
928+
"│ varchar │ int64 │ varchar │\n",
929+
"├─────────┼───────┼──────────┤\n",
930+
"│ Alice │ 25 │ New York │\n",
931+
"│ Bob │ 32 │ London │\n",
932+
"│ Charlie │ 45 │ Paris │\n",
933+
"│ David │ 19 │ Berlin │\n",
934+
"│ Eve │ 38 │ Tokyo │\n",
935+
"└─────────┴───────┴──────────┘"
936+
]
937+
},
938+
"execution_count": 4,
939+
"metadata": {},
940+
"output_type": "execute_result"
941+
}
942+
],
943+
"source": [
944+
"import duckdb\n",
945+
"\n",
946+
"duckdb.sql(\"SELECT * FROM 'customers.csv'\")"
947+
]
948+
},
949+
{
950+
"cell_type": "markdown",
951+
"id": "e5b77a38",
952+
"metadata": {},
953+
"source": [
954+
"[Link to DuckDB](https://bit.ly/4dJxNHV)."
955+
]
956+
},
831957
{
832958
"cell_type": "markdown",
833959
"id": "7d23ba4e",

Chapter5/llm.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -397,7 +397,7 @@
397397
"source": [
398398
"import sqlite3\n",
399399
"\n",
400-
"# Connect to the database (or create it if it doesn't exist)\n",
400+
"# Set up database and table for the example below\n",
401401
"conn = sqlite3.connect(\"grocery.db\")\n",
402402
"cursor = conn.cursor()\n",
403403
"\n",

docs/Chapter2/Chapter2.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -234,6 +234,7 @@
234234
<li class="toctree-l2"><a class="reference internal" href="dataclasses.html">3.7. Data Classes</a></li>
235235
<li class="toctree-l2"><a class="reference internal" href="typing.html">3.8. Typing</a></li>
236236
<li class="toctree-l2"><a class="reference internal" href="pathlib.html">3.9. pathlib</a></li>
237+
<li class="toctree-l2"><a class="reference internal" href="pydantic.html">3.10. Pydantic</a></li>
237238
</ul>
238239
</li>
239240
<li class="toctree-l1 has-children"><a class="reference internal" href="../Chapter3/Chapter3.html">4. Pandas</a><input class="toctree-checkbox" id="toctree-checkbox-4" name="toctree-checkbox-4" type="checkbox"/><label class="toctree-toggle" for="toctree-checkbox-4"><i class="fa-solid fa-chevron-down"></i></label><ul>

docs/Chapter2/dataclasses.html

Lines changed: 1 addition & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -234,6 +234,7 @@
234234
<li class="toctree-l2 current active"><a class="current reference internal" href="#">3.7. Data Classes</a></li>
235235
<li class="toctree-l2"><a class="reference internal" href="typing.html">3.8. Typing</a></li>
236236
<li class="toctree-l2"><a class="reference internal" href="pathlib.html">3.9. pathlib</a></li>
237+
<li class="toctree-l2"><a class="reference internal" href="pydantic.html">3.10. Pydantic</a></li>
237238
</ul>
238239
</li>
239240
<li class="toctree-l1 has-children"><a class="reference internal" href="../Chapter3/Chapter3.html">4. Pandas</a><input class="toctree-checkbox" id="toctree-checkbox-4" name="toctree-checkbox-4" type="checkbox"/><label class="toctree-toggle" for="toctree-checkbox-4"><i class="fa-solid fa-chevron-down"></i></label><ul>
@@ -516,7 +517,6 @@ <h2> Contents </h2>
516517
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#frozen-true-make-your-data-classes-read-only">3.7.2. frozen=True: Make Your Data Classes Read-Only</a></li>
517518
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#compare-between-two-data-classes">3.7.3. Compare Between Two Data Classes</a></li>
518519
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#post-init-add-init-method-to-a-data-class">3.7.4. Post-init: Add Init Method to a Data Class</a></li>
519-
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#simplify-data-validation-with-pydantic">3.7.5. Simplify Data Validation with Pydantic</a></li>
520520
</ul>
521521
</nav>
522522
</div>
@@ -752,46 +752,6 @@ <h2><span class="section-number">3.7.4. </span>Post-init: Add Init Method to a D
752752
</div>
753753
</div>
754754
</section>
755-
<section id="simplify-data-validation-with-pydantic">
756-
<h2><span class="section-number">3.7.5. </span>Simplify Data Validation with Pydantic<a class="headerlink" href="#simplify-data-validation-with-pydantic" title="Permalink to this heading">#</a></h2>
757-
<p>Dataclasses require manual implementation of validation.</p>
758-
<p>On the other hand, Pydantic offers built-in validation that automatically validates data and provides informative error messages. This makes Pydantic particularly useful when working with data from external sources.</p>
759-
<div class="cell docutils container">
760-
<div class="cell_input docutils container">
761-
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pydantic</span> <span class="kn">import</span> <span class="n">BaseModel</span>
762-
763-
764-
<span class="k">class</span> <span class="nc">Dog</span><span class="p">(</span><span class="n">BaseModel</span><span class="p">):</span>
765-
<span class="n">names</span><span class="p">:</span> <span class="nb">str</span>
766-
<span class="n">age</span><span class="p">:</span> <span class="nb">int</span>
767-
768-
769-
<span class="n">dog</span> <span class="o">=</span> <span class="n">Dog</span><span class="p">(</span><span class="n">names</span><span class="o">=</span><span class="s2">&quot;Bim&quot;</span><span class="p">,</span> <span class="n">age</span><span class="o">=</span><span class="s2">&quot;ten&quot;</span><span class="p">)</span>
770-
</pre></div>
771-
</div>
772-
</div>
773-
<div class="cell_output docutils container">
774-
<div class="output traceback highlight-ipythontb notranslate"><div class="highlight"><pre><span></span><span class="gt">---------------------------------------------------------------------------</span>
775-
<span class="ne">ValidationError</span><span class="g g-Whitespace"> </span>Traceback (most recent call last)
776-
<span class="n">Cell</span> <span class="n">In</span><span class="p">[</span><span class="mi">3</span><span class="p">],</span> <span class="n">line</span> <span class="mi">9</span>
777-
<span class="g g-Whitespace"> </span><span class="mi">5</span> <span class="n">names</span><span class="p">:</span> <span class="nb">str</span>
778-
<span class="g g-Whitespace"> </span><span class="mi">6</span> <span class="n">age</span><span class="p">:</span> <span class="nb">int</span>
779-
<span class="ne">----&gt; </span><span class="mi">9</span> <span class="n">dog</span> <span class="o">=</span> <span class="n">Dog</span><span class="p">(</span><span class="n">names</span><span class="o">=</span><span class="s2">&quot;Bim&quot;</span><span class="p">,</span> <span class="n">age</span><span class="o">=</span><span class="s2">&quot;ten&quot;</span><span class="p">)</span>
780-
781-
<span class="nn">File ~/book/venv/lib/python3.11/site-packages/pydantic/main.py:164,</span> in <span class="ni">BaseModel.__init__</span><span class="nt">(__pydantic_self__, **data)</span>
782-
<span class="g g-Whitespace"> </span><span class="mi">162</span> <span class="c1"># `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks</span>
783-
<span class="g g-Whitespace"> </span><span class="mi">163</span> <span class="n">__tracebackhide__</span> <span class="o">=</span> <span class="kc">True</span>
784-
<span class="ne">--&gt; </span><span class="mi">164</span> <span class="n">__pydantic_self__</span><span class="o">.</span><span class="n">__pydantic_validator__</span><span class="o">.</span><span class="n">validate_python</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">self_instance</span><span class="o">=</span><span class="n">__pydantic_self__</span><span class="p">)</span>
785-
786-
<span class="ne">ValidationError</span>: 1 validation error for Dog
787-
<span class="n">age</span>
788-
<span class="n">Input</span> <span class="n">should</span> <span class="n">be</span> <span class="n">a</span> <span class="n">valid</span> <span class="n">integer</span><span class="p">,</span> <span class="n">unable</span> <span class="n">to</span> <span class="n">parse</span> <span class="n">string</span> <span class="k">as</span> <span class="n">an</span> <span class="n">integer</span> <span class="p">[</span><span class="nb">type</span><span class="o">=</span><span class="n">int_parsing</span><span class="p">,</span> <span class="n">input_value</span><span class="o">=</span><span class="s1">&#39;ten&#39;</span><span class="p">,</span> <span class="n">input_type</span><span class="o">=</span><span class="nb">str</span><span class="p">]</span>
789-
<span class="n">For</span> <span class="n">further</span> <span class="n">information</span> <span class="n">visit</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">errors</span><span class="o">.</span><span class="n">pydantic</span><span class="o">.</span><span class="n">dev</span><span class="o">/</span><span class="mf">2.5</span><span class="o">/</span><span class="n">v</span><span class="o">/</span><span class="n">int_parsing</span>
790-
</pre></div>
791-
</div>
792-
</div>
793-
</div>
794-
</section>
795755
</section>
796756

797757
<script type="text/x-thebe-config">
@@ -861,7 +821,6 @@ <h2><span class="section-number">3.7.5. </span>Simplify Data Validation with Pyd
861821
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#frozen-true-make-your-data-classes-read-only">3.7.2. frozen=True: Make Your Data Classes Read-Only</a></li>
862822
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#compare-between-two-data-classes">3.7.3. Compare Between Two Data Classes</a></li>
863823
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#post-init-add-init-method-to-a-data-class">3.7.4. Post-init: Add Init Method to a Data Class</a></li>
864-
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#simplify-data-validation-with-pydantic">3.7.5. Simplify Data Validation with Pydantic</a></li>
865824
</ul>
866825
</nav></div>
867826

0 commit comments

Comments
 (0)