|
234 | 234 | <li class="toctree-l2"><a class="reference internal" href="../Chapter2/dataclasses.html">3.7. Data Classes</a></li>
|
235 | 235 | <li class="toctree-l2"><a class="reference internal" href="../Chapter2/typing.html">3.8. Typing</a></li>
|
236 | 236 | <li class="toctree-l2"><a class="reference internal" href="../Chapter2/pathlib.html">3.9. pathlib</a></li>
|
| 237 | +<li class="toctree-l2"><a class="reference internal" href="../Chapter2/pydantic.html">3.10. Pydantic</a></li> |
237 | 238 | </ul>
|
238 | 239 | </li>
|
239 | 240 | <li class="toctree-l1 has-children"><a class="reference internal" href="../Chapter3/Chapter3.html">4. Pandas</a><input class="toctree-checkbox" id="toctree-checkbox-4" name="toctree-checkbox-4" type="checkbox"/><label class="toctree-toggle" for="toctree-checkbox-4"><i class="fa-solid fa-chevron-down"></i></label><ul>
|
@@ -527,6 +528,7 @@ <h2> Contents </h2>
|
527 | 528 | <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#pytube-a-lightweight-python-library-for-downloading-youtube-videos">7.2.13. PyTube: A Lightweight Python Library for Downloading YouTube Videos</a></li>
|
528 | 529 | <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#limit-the-execution-time-of-a-function-call-with-prefect">7.2.14. Limit the Execution Time of a Function Call with Prefect</a></li>
|
529 | 530 | <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#retry-on-failure-with-prefect">7.2.15. Retry on Failure with Prefect</a></li>
|
| 531 | +<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#magika-detect-file-content-types-with-deep-learning">7.2.16. Magika: Detect File Content Types with Deep Learning</a></li> |
530 | 532 | </ul>
|
531 | 533 | </nav>
|
532 | 534 | </div>
|
@@ -1575,6 +1577,84 @@ <h2><span class="section-number">7.2.15. </span>Retry on Failure with Prefect<a
|
1575 | 1577 | </details>
|
1576 | 1578 | </div>
|
1577 | 1579 | </section>
|
| 1580 | +<section id="magika-detect-file-content-types-with-deep-learning"> |
| 1581 | +<h2><span class="section-number">7.2.16. </span>Magika: Detect File Content Types with Deep Learning<a class="headerlink" href="#magika-detect-file-content-types-with-deep-learning" title="Permalink to this heading">#</a></h2> |
| 1582 | +<div class="cell tag_hide-cell docutils container"> |
| 1583 | +<details class="hide above-input"> |
| 1584 | +<summary aria-label="Toggle hidden content"> |
| 1585 | +<span class="collapsed">Show code cell content</span> |
| 1586 | +<span class="expanded">Hide code cell content</span> |
| 1587 | +</summary> |
| 1588 | +<div class="cell_input docutils container"> |
| 1589 | +<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="o">!</span>pip<span class="w"> </span>install<span class="w"> </span>magika |
| 1590 | +</pre></div> |
| 1591 | +</div> |
| 1592 | +</div> |
| 1593 | +</details> |
| 1594 | +</div> |
| 1595 | +<p>Detecting file types helps identify malicious files disguised with false extensions, such as a .jpg that is actually malware.</p> |
| 1596 | +<p>Magika, Google’s AI-powered file type detection tool, uses deep learning for precise detection. In the following code, files have misleading extensions, but Magika still accurately detects their correct types.</p> |
| 1597 | +<div class="cell docutils container"> |
| 1598 | +<div class="cell_input docutils container"> |
| 1599 | +<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span> |
| 1600 | +<span class="kn">import</span> <span class="nn">shutil</span> |
| 1601 | + |
| 1602 | +<span class="c1"># Define the directory where files will be created</span> |
| 1603 | +<span class="n">directory</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="s2">"examples"</span><span class="p">)</span> |
| 1604 | + |
| 1605 | +<span class="c1"># Ensure the directory exists</span> |
| 1606 | +<span class="n">directory</span><span class="o">.</span><span class="n">mkdir</span><span class="p">(</span><span class="n">exist_ok</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> |
| 1607 | + |
| 1608 | +<span class="c1"># Empty the directory if it is not empty</span> |
| 1609 | +<span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">directory</span><span class="o">.</span><span class="n">iterdir</span><span class="p">():</span> |
| 1610 | + <span class="k">if</span> <span class="n">item</span><span class="o">.</span><span class="n">is_dir</span><span class="p">():</span> |
| 1611 | + <span class="n">shutil</span><span class="o">.</span><span class="n">rmtree</span><span class="p">(</span><span class="n">item</span><span class="p">)</span> |
| 1612 | + <span class="k">else</span><span class="p">:</span> |
| 1613 | + <span class="n">item</span><span class="o">.</span><span class="n">unlink</span><span class="p">()</span> |
| 1614 | + |
| 1615 | +<span class="c1"># Define the filenames and their respective content</span> |
| 1616 | +<span class="n">files</span> <span class="o">=</span> <span class="p">[</span> |
| 1617 | + <span class="p">(</span><span class="s2">"plain_text.csv"</span><span class="p">,</span> <span class="s2">"This is a plain text file."</span><span class="p">),</span> |
| 1618 | + <span class="p">(</span><span class="s2">"csv.json"</span><span class="p">,</span> <span class="s2">"id,name,age</span><span class="se">\n</span><span class="s2">1,John Doe,30"</span><span class="p">),</span> |
| 1619 | + <span class="p">(</span><span class="s2">"json.xml"</span><span class="p">,</span> <span class="s1">'{"name": "John", "age": 30}'</span><span class="p">),</span> |
| 1620 | + <span class="p">(</span><span class="s2">"markdown.js"</span><span class="p">,</span> <span class="s2">"# Heading 1</span><span class="se">\n</span><span class="s2">Some text."</span><span class="p">),</span> |
| 1621 | + <span class="p">(</span><span class="s2">"python.ini"</span><span class="p">,</span> <span class="s1">'print("Hello, World!")'</span><span class="p">),</span> |
| 1622 | + <span class="p">(</span><span class="s2">"js.yml"</span><span class="p">,</span> <span class="s1">'console.log("Hello, World!");'</span><span class="p">),</span> |
| 1623 | + <span class="p">(</span><span class="s2">"yml.js"</span><span class="p">,</span> <span class="s2">"name: John</span><span class="se">\n</span><span class="s2">age: 30"</span><span class="p">),</span> |
| 1624 | +<span class="p">]</span> |
| 1625 | + |
| 1626 | +<span class="c1"># Create each file with the specified content</span> |
| 1627 | +<span class="k">for</span> <span class="n">filename</span><span class="p">,</span> <span class="n">content</span> <span class="ow">in</span> <span class="n">files</span><span class="p">:</span> |
| 1628 | + <span class="p">(</span><span class="n">directory</span> <span class="o">/</span> <span class="n">filename</span><span class="p">)</span><span class="o">.</span><span class="n">write_text</span><span class="p">(</span><span class="n">content</span><span class="p">)</span> |
| 1629 | + |
| 1630 | +<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Created </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">files</span><span class="p">)</span><span class="si">}</span><span class="s2"> files in the '</span><span class="si">{</span><span class="n">directory</span><span class="si">}</span><span class="s2">' directory."</span><span class="p">)</span> |
| 1631 | +</pre></div> |
| 1632 | +</div> |
| 1633 | +</div> |
| 1634 | +<div class="cell_output docutils container"> |
| 1635 | +<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>Created 7 files in the 'examples' directory. |
| 1636 | +</pre></div> |
| 1637 | +</div> |
| 1638 | +</div> |
| 1639 | +</div> |
| 1640 | +<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>magika<span class="w"> </span>-r<span class="w"> </span>examples |
| 1641 | +</pre></div> |
| 1642 | +</div> |
| 1643 | +<div class="cell tag_remove-input docutils container"> |
| 1644 | +<div class="cell_output docutils container"> |
| 1645 | +<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span><span class=" -Color -Color-Bold -Color-Bold-Blue">examples/csv.json: CSV document (code)</span> |
| 1646 | +<span class=" -Color -Color-Bold -Color-Bold-Blue">examples/js.yml: JavaScript source (code)</span> |
| 1647 | +<span class=" -Color -Color-Bold -Color-Bold-Blue">examples/json.xml: JSON document (code)</span> |
| 1648 | +<span class=" -Color -Color-Bold -Color-Bold-White">examples/markdown.js: Markdown document (text)</span> |
| 1649 | +<span class=" -Color -Color-Bold -Color-Bold-White">examples/plain_text.csv: Generic text document (text)</span> |
| 1650 | +<span class=" -Color -Color-Bold -Color-Bold-Blue">examples/python.ini: Python source (code)</span> |
| 1651 | +<span class=" -Color -Color-Bold -Color-Bold-Blue">examples/yml.js: YAML source (code)</span> |
| 1652 | +</pre></div> |
| 1653 | +</div> |
| 1654 | +</div> |
| 1655 | +</div> |
| 1656 | +<p><a class="reference external" href="https://bit.ly/45tdw5O">Link to Magika</a>.</p> |
| 1657 | +</section> |
1578 | 1658 | </section>
|
1579 | 1659 |
|
1580 | 1660 | <script type="text/x-thebe-config">
|
@@ -1655,6 +1735,7 @@ <h2><span class="section-number">7.2.15. </span>Retry on Failure with Prefect<a
|
1655 | 1735 | <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#pytube-a-lightweight-python-library-for-downloading-youtube-videos">7.2.13. PyTube: A Lightweight Python Library for Downloading YouTube Videos</a></li>
|
1656 | 1736 | <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#limit-the-execution-time-of-a-function-call-with-prefect">7.2.14. Limit the Execution Time of a Function Call with Prefect</a></li>
|
1657 | 1737 | <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#retry-on-failure-with-prefect">7.2.15. Retry on Failure with Prefect</a></li>
|
| 1738 | +<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#magika-detect-file-content-types-with-deep-learning">7.2.16. Magika: Detect File Content Types with Deep Learning</a></li> |
1658 | 1739 | </ul>
|
1659 | 1740 | </nav></div>
|
1660 | 1741 |
|
|
0 commit comments