Skip to content

Commit 5bdd02d

Browse files
add content
1 parent bab5790 commit 5bdd02d

File tree

11 files changed

+339
-64
lines changed

11 files changed

+339
-64
lines changed

Chapter5/llm.ipynb

Lines changed: 100 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -505,11 +505,109 @@
505505
"source": [
506506
"[Link to Mirascope](https://bit.ly/4awfNhg)."
507507
]
508+
},
509+
{
510+
"cell_type": "code",
511+
"execution_count": null,
512+
"id": "c41b97ce-89fa-4789-ab7a-684bb6e86544",
513+
"metadata": {},
514+
"outputs": [],
515+
"source": [
516+
"!pip install chromadb 'numpy<2'"
517+
]
518+
},
519+
{
520+
"cell_type": "markdown",
521+
"id": "560d4baa-ac88-4e0c-b984-d3a0db13b140",
522+
"metadata": {},
523+
"source": [
524+
"Managing and querying large collections of text data using traditional databases or simple search methods results in poor semantic matches and complex implementation. This causes difficulties in building AI applications that need to find contextually similar content."
525+
]
526+
},
527+
{
528+
"cell_type": "code",
529+
"execution_count": 1,
530+
"id": "54481d9c-2dde-4f6a-96ba-0fe7166e5800",
531+
"metadata": {},
532+
"outputs": [
533+
{
534+
"name": "stdout",
535+
"output_type": "stream",
536+
"text": [
537+
"['The weather is great today']\n"
538+
]
539+
}
540+
],
541+
"source": [
542+
"# Traditional approach with basic text search\n",
543+
"documents = [\n",
544+
" \"The weather is great today\",\n",
545+
" \"The climate is excellent\",\n",
546+
" \"Machine learning models are fascinating\",\n",
547+
"]\n",
548+
"\n",
549+
"# Search by exact match or simple substring\n",
550+
"query = \"How's the weather?\"\n",
551+
"results = [doc for doc in documents if \"weather\" in doc.lower()]\n",
552+
"\n",
553+
"# Only finds documents with exact word \"weather\", misses semantically similar ones\n",
554+
"print(results)"
555+
]
556+
},
557+
{
558+
"cell_type": "markdown",
559+
"id": "154ccc52-93a4-4c1c-9e53-056bf27af0eb",
560+
"metadata": {},
561+
"source": [
562+
"You can use Chroma to easily store and query documents using their semantic meaning through embeddings. The tool handles the embedding creation and similarity search automatically, making it simple to build AI applications with semantic search capabilities.\n",
563+
"\n"
564+
]
565+
},
566+
{
567+
"cell_type": "code",
568+
"execution_count": 1,
569+
"id": "93f64f0f-e706-4b43-82d4-80c9fa60ed52",
570+
"metadata": {},
571+
"outputs": [],
572+
"source": [
573+
"import chromadb\n",
574+
"\n",
575+
"# Initialize client and collection\n",
576+
"client = chromadb.Client()\n",
577+
"collection = client.create_collection(\"documents\")\n",
578+
"\n",
579+
"# Add documents\n",
580+
"collection.add(\n",
581+
" documents=[\n",
582+
" \"The weather is great today\",\n",
583+
" \"The climate is excellent\",\n",
584+
" \"Machine learning models are fascinating\"\n",
585+
" ],\n",
586+
" ids=[\"doc1\", \"doc2\", \"doc3\"]\n",
587+
")\n",
588+
"\n",
589+
"# Query semantically similar documents\n",
590+
"results = collection.query(\n",
591+
" query_texts=[\"How's the weather?\"],\n",
592+
" n_results=2\n",
593+
")\n",
594+
"# Returns both weather and climate documents due to semantic similarity\n",
595+
"print(results['documents'])"
596+
]
597+
},
598+
{
599+
"cell_type": "markdown",
600+
"id": "a6e812d7-74e1-49ed-9f32-f53d84932e48",
601+
"metadata": {},
602+
"source": [
603+
"The example shows how Chroma automatically converts text into embeddings and finds semantically similar documents, even when they don't share exact words. This makes it much easier to build applications that can understand the meaning of text, not just match keywords.\n",
604+
"\n"
605+
]
508606
}
509607
],
510608
"metadata": {
511609
"kernelspec": {
512-
"display_name": "venv",
610+
"display_name": "Python 3 (ipykernel)",
513611
"language": "python",
514612
"name": "python3"
515613
},
@@ -523,7 +621,7 @@
523621
"name": "python",
524622
"nbconvert_exporter": "python",
525623
"pygments_lexer": "ipython3",
526-
"version": "3.11.4"
624+
"version": "3.11.6"
527625
}
528626
},
529627
"nbformat": 4,

Chapter7/jupyter_notebook.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1328,7 +1328,7 @@
13281328
"name": "python",
13291329
"nbconvert_exporter": "python",
13301330
"pygments_lexer": "ipython3",
1331-
"version": "3.11.2"
1331+
"version": "3.11.6"
13321332
},
13331333
"toc": {
13341334
"base_numbering": 1,

docs/Chapter5/llm.html

Lines changed: 75 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@
101101
<link rel="index" title="Index" href="../genindex.html" />
102102
<link rel="search" title="Search" href="../search.html" />
103103
<link rel="next" title="7. Cool Tools" href="../Chapter6/Chapter6.html" />
104-
<link rel="prev" title="6.15. PySpark" href="spark.html" />
104+
<link rel="prev" title="6.15. 3 Powerful Ways to Create PySpark DataFrames" href="spark.html" />
105105
<meta name="viewport" content="width=device-width, initial-scale=1"/>
106106
<meta name="docsearch:language" content="en"/>
107107
</head>
@@ -211,17 +211,17 @@
211211
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/get_elements.html">2.3.1. Get Elements</a></li>
212212
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/unpack_iterables.html">2.3.2. Unpack Iterables</a></li>
213213
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/join_iterable.html">2.3.3. Join Iterables</a></li>
214-
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/interaction_between_2_lists.html">2.3.4. Interaction Between 2 Lists</a></li>
215-
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/apply_functions_to_elements.html">2.3.5. Apply Functions to Elements in a List</a></li>
214+
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/apply_functions_to_elements.html">2.3.4. Apply Functions to Elements in a List</a></li>
216215
</ul>
217216
</li>
218-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/dictionary.html">2.4. Dictionary</a></li>
219-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/function.html">2.5. Function</a></li>
220-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/class.html">2.6. Classes</a></li>
221-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/datetime.html">2.7. Datetime</a></li>
222-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/code_speed.html">2.8. Code Speed</a></li>
223-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/good_practices.html">2.9. Good Python Practices</a></li>
224-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/python_new_features.html">2.10. New Features in Python</a></li>
217+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/set.html">2.4. Set</a></li>
218+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/dictionary.html">2.5. Dictionary</a></li>
219+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/function.html">2.6. Function</a></li>
220+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/class.html">2.7. Classes</a></li>
221+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/datetime.html">2.8. Datetime</a></li>
222+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/code_speed.html">2.9. Code Speed</a></li>
223+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/good_practices.html">2.10. Good Python Practices</a></li>
224+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/python_new_features.html">2.11. New Features in Python</a></li>
225225
</ul>
226226
</li>
227227
<li class="toctree-l1 has-children"><a class="reference internal" href="../Chapter2/Chapter2.html">3. Python Utility Libraries</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" type="checkbox"/><label class="toctree-toggle" for="toctree-checkbox-3"><i class="fa-solid fa-chevron-down"></i></label><ul>
@@ -271,7 +271,7 @@
271271
<li class="toctree-l2"><a class="reference internal" href="better_pandas.html">6.12. Better Pandas</a></li>
272272
<li class="toctree-l2"><a class="reference internal" href="testing.html">6.13. Testing</a></li>
273273
<li class="toctree-l2"><a class="reference internal" href="SQL.html">6.14. SQL Libraries</a></li>
274-
<li class="toctree-l2"><a class="reference internal" href="spark.html">6.15. PySpark</a></li>
274+
<li class="toctree-l2"><a class="reference internal" href="spark.html">6.15. 3 Powerful Ways to Create PySpark DataFrames</a></li>
275275
<li class="toctree-l2 current active"><a class="current reference internal" href="#">6.16. Large Language Model (LLM)</a></li>
276276
</ul>
277277
</li>
@@ -855,6 +855,69 @@ <h2><span class="section-number">6.16.4. </span>Maximize Accuracy and Relevance
855855
</div>
856856
</div>
857857
<p><a class="reference external" href="https://bit.ly/4awfNhg">Link to Mirascope</a>.</p>
858+
<div class="cell docutils container">
859+
<div class="cell_input docutils container">
860+
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="o">!</span>pip<span class="w"> </span>install<span class="w"> </span>chromadb<span class="w"> </span><span class="s1">&#39;numpy&lt;2&#39;</span>
861+
</pre></div>
862+
</div>
863+
</div>
864+
</div>
865+
<p>Managing and querying large collections of text data using traditional databases or simple search methods results in poor semantic matches and complex implementation. This causes difficulties in building AI applications that need to find contextually similar content.</p>
866+
<div class="cell docutils container">
867+
<div class="cell_input docutils container">
868+
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># Traditional approach with basic text search</span>
869+
<span class="n">documents</span> <span class="o">=</span> <span class="p">[</span>
870+
<span class="s2">&quot;The weather is great today&quot;</span><span class="p">,</span>
871+
<span class="s2">&quot;The climate is excellent&quot;</span><span class="p">,</span>
872+
<span class="s2">&quot;Machine learning models are fascinating&quot;</span><span class="p">,</span>
873+
<span class="p">]</span>
874+
875+
<span class="c1"># Search by exact match or simple substring</span>
876+
<span class="n">query</span> <span class="o">=</span> <span class="s2">&quot;How&#39;s the weather?&quot;</span>
877+
<span class="n">results</span> <span class="o">=</span> <span class="p">[</span><span class="n">doc</span> <span class="k">for</span> <span class="n">doc</span> <span class="ow">in</span> <span class="n">documents</span> <span class="k">if</span> <span class="s2">&quot;weather&quot;</span> <span class="ow">in</span> <span class="n">doc</span><span class="o">.</span><span class="n">lower</span><span class="p">()]</span>
878+
879+
<span class="c1"># Only finds documents with exact word &quot;weather&quot;, misses semantically similar ones</span>
880+
<span class="nb">print</span><span class="p">(</span><span class="n">results</span><span class="p">)</span>
881+
</pre></div>
882+
</div>
883+
</div>
884+
<div class="cell_output docutils container">
885+
<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>[&#39;The weather is great today&#39;]
886+
</pre></div>
887+
</div>
888+
</div>
889+
</div>
890+
<p>You can use Chroma to easily store and query documents using their semantic meaning through embeddings. The tool handles the embedding creation and similarity search automatically, making it simple to build AI applications with semantic search capabilities.</p>
891+
<div class="cell docutils container">
892+
<div class="cell_input docutils container">
893+
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">chromadb</span>
894+
895+
<span class="c1"># Initialize client and collection</span>
896+
<span class="n">client</span> <span class="o">=</span> <span class="n">chromadb</span><span class="o">.</span><span class="n">Client</span><span class="p">()</span>
897+
<span class="n">collection</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">create_collection</span><span class="p">(</span><span class="s2">&quot;documents&quot;</span><span class="p">)</span>
898+
899+
<span class="c1"># Add documents</span>
900+
<span class="n">collection</span><span class="o">.</span><span class="n">add</span><span class="p">(</span>
901+
<span class="n">documents</span><span class="o">=</span><span class="p">[</span>
902+
<span class="s2">&quot;The weather is great today&quot;</span><span class="p">,</span>
903+
<span class="s2">&quot;The climate is excellent&quot;</span><span class="p">,</span>
904+
<span class="s2">&quot;Machine learning models are fascinating&quot;</span>
905+
<span class="p">],</span>
906+
<span class="n">ids</span><span class="o">=</span><span class="p">[</span><span class="s2">&quot;doc1&quot;</span><span class="p">,</span> <span class="s2">&quot;doc2&quot;</span><span class="p">,</span> <span class="s2">&quot;doc3&quot;</span><span class="p">]</span>
907+
<span class="p">)</span>
908+
909+
<span class="c1"># Query semantically similar documents</span>
910+
<span class="n">results</span> <span class="o">=</span> <span class="n">collection</span><span class="o">.</span><span class="n">query</span><span class="p">(</span>
911+
<span class="n">query_texts</span><span class="o">=</span><span class="p">[</span><span class="s2">&quot;How&#39;s the weather?&quot;</span><span class="p">],</span>
912+
<span class="n">n_results</span><span class="o">=</span><span class="mi">2</span>
913+
<span class="p">)</span>
914+
<span class="c1"># Returns both weather and climate documents due to semantic similarity</span>
915+
<span class="nb">print</span><span class="p">(</span><span class="n">results</span><span class="p">[</span><span class="s1">&#39;documents&#39;</span><span class="p">])</span>
916+
</pre></div>
917+
</div>
918+
</div>
919+
</div>
920+
<p>The example shows how Chroma automatically converts text into embeddings and finds semantically similar documents, even when they don’t share exact words. This makes it much easier to build applications that can understand the meaning of text, not just match keywords.</p>
858921
</section>
859922
</section>
860923

@@ -894,7 +957,7 @@ <h2><span class="section-number">6.16.4. </span>Maximize Accuracy and Relevance
894957
<i class="fa-solid fa-angle-left"></i>
895958
<div class="prev-next-info">
896959
<p class="prev-next-subtitle">previous</p>
897-
<p class="prev-next-title"><span class="section-number">6.15. </span>PySpark</p>
960+
<p class="prev-next-title"><span class="section-number">6.15. </span>3 Powerful Ways to Create PySpark DataFrames</p>
898961
</div>
899962
</a>
900963
<a class="right-next"

docs/Chapter7/Chapter7.html

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -211,17 +211,17 @@
211211
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/get_elements.html">2.3.1. Get Elements</a></li>
212212
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/unpack_iterables.html">2.3.2. Unpack Iterables</a></li>
213213
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/join_iterable.html">2.3.3. Join Iterables</a></li>
214-
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/interaction_between_2_lists.html">2.3.4. Interaction Between 2 Lists</a></li>
215-
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/apply_functions_to_elements.html">2.3.5. Apply Functions to Elements in a List</a></li>
214+
<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/apply_functions_to_elements.html">2.3.4. Apply Functions to Elements in a List</a></li>
216215
</ul>
217216
</li>
218-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/dictionary.html">2.4. Dictionary</a></li>
219-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/function.html">2.5. Function</a></li>
220-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/class.html">2.6. Classes</a></li>
221-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/datetime.html">2.7. Datetime</a></li>
222-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/code_speed.html">2.8. Code Speed</a></li>
223-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/good_practices.html">2.9. Good Python Practices</a></li>
224-
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/python_new_features.html">2.10. New Features in Python</a></li>
217+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/set.html">2.4. Set</a></li>
218+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/dictionary.html">2.5. Dictionary</a></li>
219+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/function.html">2.6. Function</a></li>
220+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/class.html">2.7. Classes</a></li>
221+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/datetime.html">2.8. Datetime</a></li>
222+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/code_speed.html">2.9. Code Speed</a></li>
223+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/good_practices.html">2.10. Good Python Practices</a></li>
224+
<li class="toctree-l2"><a class="reference internal" href="../Chapter1/python_new_features.html">2.11. New Features in Python</a></li>
225225
</ul>
226226
</li>
227227
<li class="toctree-l1 has-children"><a class="reference internal" href="../Chapter2/Chapter2.html">3. Python Utility Libraries</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" type="checkbox"/><label class="toctree-toggle" for="toctree-checkbox-3"><i class="fa-solid fa-chevron-down"></i></label><ul>
@@ -271,7 +271,7 @@
271271
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/better_pandas.html">6.12. Better Pandas</a></li>
272272
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/testing.html">6.13. Testing</a></li>
273273
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/SQL.html">6.14. SQL Libraries</a></li>
274-
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/spark.html">6.15. PySpark</a></li>
274+
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/spark.html">6.15. 3 Powerful Ways to Create PySpark DataFrames</a></li>
275275
<li class="toctree-l2"><a class="reference internal" href="../Chapter5/llm.html">6.16. Large Language Model (LLM)</a></li>
276276
</ul>
277277
</li>

0 commit comments

Comments
 (0)