|
101 | 101 | <link rel="index" title="Index" href="../genindex.html" />
|
102 | 102 | <link rel="search" title="Search" href="../search.html" />
|
103 | 103 | <link rel="next" title="7. Cool Tools" href="../Chapter6/Chapter6.html" />
|
104 |
| - <link rel="prev" title="6.15. PySpark" href="spark.html" /> |
| 104 | + <link rel="prev" title="6.15. 3 Powerful Ways to Create PySpark DataFrames" href="spark.html" /> |
105 | 105 | <meta name="viewport" content="width=device-width, initial-scale=1"/>
|
106 | 106 | <meta name="docsearch:language" content="en"/>
|
107 | 107 | </head>
|
|
211 | 211 | <li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/get_elements.html">2.3.1. Get Elements</a></li>
|
212 | 212 | <li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/unpack_iterables.html">2.3.2. Unpack Iterables</a></li>
|
213 | 213 | <li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/join_iterable.html">2.3.3. Join Iterables</a></li>
|
214 |
| -<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/interaction_between_2_lists.html">2.3.4. Interaction Between 2 Lists</a></li> |
215 |
| -<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/apply_functions_to_elements.html">2.3.5. Apply Functions to Elements in a List</a></li> |
| 214 | +<li class="toctree-l3"><a class="reference internal" href="../Chapter1/list/apply_functions_to_elements.html">2.3.4. Apply Functions to Elements in a List</a></li> |
216 | 215 | </ul>
|
217 | 216 | </li>
|
218 |
| -<li class="toctree-l2"><a class="reference internal" href="../Chapter1/dictionary.html">2.4. Dictionary</a></li> |
219 |
| -<li class="toctree-l2"><a class="reference internal" href="../Chapter1/function.html">2.5. Function</a></li> |
220 |
| -<li class="toctree-l2"><a class="reference internal" href="../Chapter1/class.html">2.6. Classes</a></li> |
221 |
| -<li class="toctree-l2"><a class="reference internal" href="../Chapter1/datetime.html">2.7. Datetime</a></li> |
222 |
| -<li class="toctree-l2"><a class="reference internal" href="../Chapter1/code_speed.html">2.8. Code Speed</a></li> |
223 |
| -<li class="toctree-l2"><a class="reference internal" href="../Chapter1/good_practices.html">2.9. Good Python Practices</a></li> |
224 |
| -<li class="toctree-l2"><a class="reference internal" href="../Chapter1/python_new_features.html">2.10. New Features in Python</a></li> |
| 217 | +<li class="toctree-l2"><a class="reference internal" href="../Chapter1/set.html">2.4. Set</a></li> |
| 218 | +<li class="toctree-l2"><a class="reference internal" href="../Chapter1/dictionary.html">2.5. Dictionary</a></li> |
| 219 | +<li class="toctree-l2"><a class="reference internal" href="../Chapter1/function.html">2.6. Function</a></li> |
| 220 | +<li class="toctree-l2"><a class="reference internal" href="../Chapter1/class.html">2.7. Classes</a></li> |
| 221 | +<li class="toctree-l2"><a class="reference internal" href="../Chapter1/datetime.html">2.8. Datetime</a></li> |
| 222 | +<li class="toctree-l2"><a class="reference internal" href="../Chapter1/code_speed.html">2.9. Code Speed</a></li> |
| 223 | +<li class="toctree-l2"><a class="reference internal" href="../Chapter1/good_practices.html">2.10. Good Python Practices</a></li> |
| 224 | +<li class="toctree-l2"><a class="reference internal" href="../Chapter1/python_new_features.html">2.11. New Features in Python</a></li> |
225 | 225 | </ul>
|
226 | 226 | </li>
|
227 | 227 | <li class="toctree-l1 has-children"><a class="reference internal" href="../Chapter2/Chapter2.html">3. Python Utility Libraries</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" type="checkbox"/><label class="toctree-toggle" for="toctree-checkbox-3"><i class="fa-solid fa-chevron-down"></i></label><ul>
|
|
271 | 271 | <li class="toctree-l2"><a class="reference internal" href="better_pandas.html">6.12. Better Pandas</a></li>
|
272 | 272 | <li class="toctree-l2"><a class="reference internal" href="testing.html">6.13. Testing</a></li>
|
273 | 273 | <li class="toctree-l2"><a class="reference internal" href="SQL.html">6.14. SQL Libraries</a></li>
|
274 |
| -<li class="toctree-l2"><a class="reference internal" href="spark.html">6.15. PySpark</a></li> |
| 274 | +<li class="toctree-l2"><a class="reference internal" href="spark.html">6.15. 3 Powerful Ways to Create PySpark DataFrames</a></li> |
275 | 275 | <li class="toctree-l2 current active"><a class="current reference internal" href="#">6.16. Large Language Model (LLM)</a></li>
|
276 | 276 | </ul>
|
277 | 277 | </li>
|
@@ -855,6 +855,69 @@ <h2><span class="section-number">6.16.4. </span>Maximize Accuracy and Relevance
|
855 | 855 | </div>
|
856 | 856 | </div>
|
857 | 857 | <p><a class="reference external" href="https://bit.ly/4awfNhg">Link to Mirascope</a>.</p>
|
| 858 | +<div class="cell docutils container"> |
| 859 | +<div class="cell_input docutils container"> |
| 860 | +<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="o">!</span>pip<span class="w"> </span>install<span class="w"> </span>chromadb<span class="w"> </span><span class="s1">'numpy<2'</span> |
| 861 | +</pre></div> |
| 862 | +</div> |
| 863 | +</div> |
| 864 | +</div> |
| 865 | +<p>Managing and querying large collections of text data using traditional databases or simple search methods results in poor semantic matches and complex implementation. This causes difficulties in building AI applications that need to find contextually similar content.</p> |
| 866 | +<div class="cell docutils container"> |
| 867 | +<div class="cell_input docutils container"> |
| 868 | +<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># Traditional approach with basic text search</span> |
| 869 | +<span class="n">documents</span> <span class="o">=</span> <span class="p">[</span> |
| 870 | + <span class="s2">"The weather is great today"</span><span class="p">,</span> |
| 871 | + <span class="s2">"The climate is excellent"</span><span class="p">,</span> |
| 872 | + <span class="s2">"Machine learning models are fascinating"</span><span class="p">,</span> |
| 873 | +<span class="p">]</span> |
| 874 | + |
| 875 | +<span class="c1"># Search by exact match or simple substring</span> |
| 876 | +<span class="n">query</span> <span class="o">=</span> <span class="s2">"How's the weather?"</span> |
| 877 | +<span class="n">results</span> <span class="o">=</span> <span class="p">[</span><span class="n">doc</span> <span class="k">for</span> <span class="n">doc</span> <span class="ow">in</span> <span class="n">documents</span> <span class="k">if</span> <span class="s2">"weather"</span> <span class="ow">in</span> <span class="n">doc</span><span class="o">.</span><span class="n">lower</span><span class="p">()]</span> |
| 878 | + |
| 879 | +<span class="c1"># Only finds documents with exact word "weather", misses semantically similar ones</span> |
| 880 | +<span class="nb">print</span><span class="p">(</span><span class="n">results</span><span class="p">)</span> |
| 881 | +</pre></div> |
| 882 | +</div> |
| 883 | +</div> |
| 884 | +<div class="cell_output docutils container"> |
| 885 | +<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>['The weather is great today'] |
| 886 | +</pre></div> |
| 887 | +</div> |
| 888 | +</div> |
| 889 | +</div> |
| 890 | +<p>You can use Chroma to easily store and query documents using their semantic meaning through embeddings. The tool handles the embedding creation and similarity search automatically, making it simple to build AI applications with semantic search capabilities.</p> |
| 891 | +<div class="cell docutils container"> |
| 892 | +<div class="cell_input docutils container"> |
| 893 | +<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">chromadb</span> |
| 894 | + |
| 895 | +<span class="c1"># Initialize client and collection</span> |
| 896 | +<span class="n">client</span> <span class="o">=</span> <span class="n">chromadb</span><span class="o">.</span><span class="n">Client</span><span class="p">()</span> |
| 897 | +<span class="n">collection</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">create_collection</span><span class="p">(</span><span class="s2">"documents"</span><span class="p">)</span> |
| 898 | + |
| 899 | +<span class="c1"># Add documents</span> |
| 900 | +<span class="n">collection</span><span class="o">.</span><span class="n">add</span><span class="p">(</span> |
| 901 | + <span class="n">documents</span><span class="o">=</span><span class="p">[</span> |
| 902 | + <span class="s2">"The weather is great today"</span><span class="p">,</span> |
| 903 | + <span class="s2">"The climate is excellent"</span><span class="p">,</span> |
| 904 | + <span class="s2">"Machine learning models are fascinating"</span> |
| 905 | + <span class="p">],</span> |
| 906 | + <span class="n">ids</span><span class="o">=</span><span class="p">[</span><span class="s2">"doc1"</span><span class="p">,</span> <span class="s2">"doc2"</span><span class="p">,</span> <span class="s2">"doc3"</span><span class="p">]</span> |
| 907 | +<span class="p">)</span> |
| 908 | + |
| 909 | +<span class="c1"># Query semantically similar documents</span> |
| 910 | +<span class="n">results</span> <span class="o">=</span> <span class="n">collection</span><span class="o">.</span><span class="n">query</span><span class="p">(</span> |
| 911 | + <span class="n">query_texts</span><span class="o">=</span><span class="p">[</span><span class="s2">"How's the weather?"</span><span class="p">],</span> |
| 912 | + <span class="n">n_results</span><span class="o">=</span><span class="mi">2</span> |
| 913 | +<span class="p">)</span> |
| 914 | +<span class="c1"># Returns both weather and climate documents due to semantic similarity</span> |
| 915 | +<span class="nb">print</span><span class="p">(</span><span class="n">results</span><span class="p">[</span><span class="s1">'documents'</span><span class="p">])</span> |
| 916 | +</pre></div> |
| 917 | +</div> |
| 918 | +</div> |
| 919 | +</div> |
| 920 | +<p>The example shows how Chroma automatically converts text into embeddings and finds semantically similar documents, even when they don’t share exact words. This makes it much easier to build applications that can understand the meaning of text, not just match keywords.</p> |
858 | 921 | </section>
|
859 | 922 | </section>
|
860 | 923 |
|
@@ -894,7 +957,7 @@ <h2><span class="section-number">6.16.4. </span>Maximize Accuracy and Relevance
|
894 | 957 | <i class="fa-solid fa-angle-left"></i>
|
895 | 958 | <div class="prev-next-info">
|
896 | 959 | <p class="prev-next-subtitle">previous</p>
|
897 |
| - <p class="prev-next-title"><span class="section-number">6.15. </span>PySpark</p> |
| 960 | + <p class="prev-next-title"><span class="section-number">6.15. </span>3 Powerful Ways to Create PySpark DataFrames</p> |
898 | 961 | </div>
|
899 | 962 | </a>
|
900 | 963 | <a class="right-next"
|
|
0 commit comments