You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+25-69Lines changed: 25 additions & 69 deletions
Original file line number
Diff line number
Diff line change
@@ -17,77 +17,33 @@ This repository is a curated collection of data science articles from CodeCut, c
17
17
9.[LLM](#llm)
18
18
10.[Speed-up Tools](#speed-up-tools)
19
19
20
-
## MLOps
21
20
22
-
| Title | Article | Repository | Video |
23
-
|-------|---------|------------|--------|
24
-
| Goodbye Pip and Poetry. Why UV Might Be All You Need |[🔗](https://codecut.ai/why-uv-might-all-you-need/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)||
25
-
| Stop Hard Coding in a Data Science Project – Use Configuration Files Instead | [🔗](https://codecut.ai/stop-hard-coding-in-a-data-science-project-use-configuration-files-instead/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/hydra-demo) | [🔗](https://youtu.be/jaX9zrC7y4Y)
26
-
| Poetry: A Better Way to Manage Python Dependencies | [🔗](https://codecut.ai/poetry-a-better-way-to-manage-python-dependencies/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | | [🔗](https://youtu.be/-QSUyDvHQGY)
27
-
| Git for Data Scientists: Learn Git through Practical Examples | [🔗](https://codecut.ai/git-deep-dive-for-data-scientists/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | | [🔗](https://youtu.be/UKCTvrJSoL0)
28
-
| 4 pre-commit Plugins to Automate Code Reviewing and Formatting in Python | [🔗](https://codecut.ai/4-pre-commit-plugins-to-automate-code-reviewing-and-formatting-in-python-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/tree/master/productive_tools/precommit_examples) | [🔗](https://youtube.com/playlist?list=PLnK6m_JBRVNqskWiXLxx1QRDDng9O8Fsf)
29
-
| How to Structure a Data Science Project for Maintainability | [🔗](https://codecut.ai/how-to-structure-a-data-science-project-for-readability-and-transparency-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/data-science-template/tree/dvc-poetry) | [🔗](https://youtu.be/TzvcPi3nsdw)
| How to Build a Fully Automated Data Drift Detection Pipeline | [🔗](https://codecut.ai/build-a-fully-automated-data-drift-detection-pipeline/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/detect-data-drift-pipeline) | [🔗](https://youtu.be/4w2ly3WuL40)
33
21
34
-
## Data Management Tools
22
+
| Category | Title | Article | Repository | Video |
| MLOps | Goodbye Pip and Poetry. Why UV Might Be All You Need |[🔗](https://codecut.ai/why-uv-might-all-you-need/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|||
25
+
| MLOps | Stop Hard Coding in a Data Science Project – Use Configuration Files Instead |[🔗](https://codecut.ai/stop-hard-coding-in-a-data-science-project-use-configuration-files-instead/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|[🔗](https://github.com/khuyentran1401/hydra-demo)|[🔗](https://youtu.be/jaX9zrC7y4Y)|
26
+
| MLOps | Poetry: A Better Way to Manage Python Dependencies |[🔗](https://codecut.ai/poetry-a-better-way-to-manage-python-dependencies/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)||[🔗](https://youtu.be/-QSUyDvHQGY)|
27
+
| MLOps | Git for Data Scientists: Learn Git through Practical Examples |[🔗](https://codecut.ai/git-deep-dive-for-data-scientists/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)||[🔗](https://youtu.be/UKCTvrJSoL0)|
28
+
| MLOps | 4 pre-commit Plugins to Automate Code Reviewing and Formatting in Python |[🔗](https://codecut.ai/4-pre-commit-plugins-to-automate-code-reviewing-and-formatting-in-python-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|[🔗](https://github.com/khuyentran1401/Data-science/tree/master/productive_tools/precommit_examples)|[🔗](https://youtube.com/playlist?list=PLnK6m_JBRVNqskWiXLxx1QRDDng9O8Fsf)|
29
+
| MLOps | How to Structure a Data Science Project for Maintainability |[🔗](https://codecut.ai/how-to-structure-a-data-science-project-for-readability-and-transparency-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|[🔗](https://github.com/khuyentran1401/data-science-template/tree/dvc-poetry)|[🔗](https://youtu.be/TzvcPi3nsdw)|
| MLOps | How to Build a Fully Automated Data Drift Detection Pipeline |[🔗](https://codecut.ai/build-a-fully-automated-data-drift-detection-pipeline/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|[🔗](https://github.com/khuyentran1401/detect-data-drift-pipeline)|[🔗](https://youtu.be/4w2ly3WuL40)|
33
+
| Data Management Tools | Version Control for Data and Models Using DVC |[🔗](https://codecut.ai/introduction-to-dvc-data-version-control-tool-for-machine-learning-projects-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|[🔗](https://github.com/khuyentran1401/dvc-demo)|[🔗](https://youtu.be/80s_dbfiqLM)|
34
+
| Data Management Tools | What is dbt (data build tool) and When should you use it? |[🔗](https://codecut.ai/build-an-efficient-data-pipeline-is-dbt-the-key/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|[🔗](https://github.com/khuyentran1401/dbt-demo)|[🔗](https://youtu.be/mM5zWBP3G_U)|
35
+
| Data Management Tools | Streamline dbt Model Development with Notebook-Style Workspace |[🔗](https://codecut.ai/dbt-mage-interactively-build-and-orchestrate-data-models/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|[🔗](https://github.com/khuyentran1401/dbt-mage)|[🔗](https://youtu.be/vQFg1Mp60-s)|
36
+
| Testing | Pytest for Data Scientists |[🔗](https://codecut.ai/pytest-for-data-scientists-3/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|[🔗](https://github.com/khuyentran1401/Data-science/tree/master/data_science_tools/pytest)|[🔗](https://www.youtube.com/playlist?list=PLnK6m_JBRVNoYEer9hBmTNwkYB3gmbOPO)|
| Python Helper Tools | Introducing FugueSQL — SQL for Pandas, Spark, and Dask DataFrames |[🔗](https://codecut.ai/introducing-fuguesql-sql-for-pandas-spark-and-dask-dataframes-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|[🔗](https://github.com/khuyentran1401/Data-science/blob/master/data_science_tools/fugueSQL.ipynb)||
39
+
| Python Helper Tools | Fugue and DuckDB: Fast SQL Code in Python |[🔗](https://codecut.ai/fugue-and-duckdb-fast-sql-code-in-python-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|[🔗](https://github.com/khuyentran1401/Data-science/blob/master/productive_tools/Fugue_and_Duckdb/Fugue_and_Duckdb.ipynb)||
40
+
| Feature Engineering | Polars vs. Pandas: A Fast, Multi-Core Alternative for DataFrames |[🔗](https://codecut.ai/polars-vs-pandas-a-fast-multi-core-alternative-for-dataframes/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|[🔗](https://khuyentran1401.github.io/Data-science/data_science_tools/polars_vs_pandas.html)||
41
+
| Visualization | Top 6 Python Libraries for Visualization: Which one to Use? |[🔗](https://codecut.ai/top-6-python-libraries-for-visualization-which-one-to-use/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|[🔗](https://github.com/khuyentran1401/Data-science/tree/master/visualization/top_visualization.ipynb)||
42
+
| Python | Python Clean Code: 6 Best Practices to Make Your Python Functions More Readable |[🔗](https://codecut.ai/python-clean-code-6-best-practices-to-make-your-python-functions-more-readable-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|[🔗](https://github.com/khuyentran1401/Data-science/tree/master/python/good_functions)|[🔗](https://youtu.be/IDHD8JYBl5M)|
43
+
| Logging and Debugging | Loguru: Simple as Print, Flexible as Logging |[🔗](https://codecut.ai/simplify-your-python-logging-with-loguru/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|[🔗](https://github.com/khuyentran1401/Data-science/tree/master/productive_tools/logging_tools)|[🔗](https://youtu.be/XY_OrUoR-HU)|
44
+
| LLM | Enforce Structured Outputs from LLMs with PydanticAI |[🔗](https://codecut.ai/enforce-structured-outputs-from-llms-with-pydanticai/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|[🔗](https://khuyentran1401.github.io/Data-science/llm/pydantic_ai_examples.html)||
| Version Control for Data and Models Using DVC | [🔗](https://codecut.ai/introduction-to-dvc-data-version-control-tool-for-machine-learning-projects-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/dvc-demo) | [🔗](https://youtu.be/80s_dbfiqLM)
39
-
| What is dbt (data build tool) and When should you use it? | [🔗](https://codecut.ai/build-an-efficient-data-pipeline-is-dbt-the-key/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/dbt-demo) | [🔗](https://youtu.be/mM5zWBP3G_U)
40
-
| Streamline dbt Model Development with Notebook-Style Workspace | [🔗](https://codecut.ai/dbt-mage-interactively-build-and-orchestrate-data-models/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/dbt-mage) | [🔗](https://youtu.be/vQFg1Mp60-s)
41
-
42
-
## Testing
43
-
44
-
| Title | Article | Repository | Video |
45
-
|-------|---------|------------|--------|
46
-
| Pytest for Data Scientists | [🔗](https://codecut.ai/pytest-for-data-scientists-3/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/tree/master/data_science_tools/pytest) | [🔗](https://www.youtube.com/playlist?list=PLnK6m_JBRVNoYEer9hBmTNwkYB3gmbOPO)
| Introducing FugueSQL — SQL for Pandas, Spark, and Dask DataFrames | [🔗](https://codecut.ai/introducing-fuguesql-sql-for-pandas-spark-and-dask-dataframes-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/blob/master/data_science_tools/fugueSQL.ipynb)
54
-
| Fugue and DuckDB: Fast SQL Code in Python | [🔗](https://codecut.ai/fugue-and-duckdb-fast-sql-code-in-python-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/blob/master/productive_tools/Fugue_and_Duckdb/Fugue_and_Duckdb.ipynb)
55
-
56
-
## Feature Engineering
57
-
58
-
| Title | Article | Repository | Video |
59
-
|-------|---------|------------|--------|
60
-
| Polars vs. Pandas: A Fast, Multi-Core Alternative for DataFrames | [🔗](https://codecut.ai/polars-vs-pandas-a-fast-multi-core-alternative-for-dataframes/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://khuyentran1401.github.io/Data-science/data_science_tools/polars_vs_pandas.html)
61
-
62
-
## Visualization
63
-
64
-
| Title | Article | Repository | Video |
65
-
|-------|---------|------------|--------|
66
-
| Top 6 Python Libraries for Visualization: Which one to Use? | [🔗](https://codecut.ai/top-6-python-libraries-for-visualization-which-one-to-use/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/tree/master/visualization/top_visualization.ipynb)
67
-
68
-
## Python
69
-
70
-
| Title | Article | Repository | Video |
71
-
|-------|---------|------------|--------|
72
-
| Python Clean Code: 6 Best Practices to Make Your Python Functions More Readable | [🔗](https://codecut.ai/python-clean-code-6-best-practices-to-make-your-python-functions-more-readable-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/tree/master/python/good_functions) | [🔗](https://youtu.be/IDHD8JYBl5M)
73
-
74
-
## Logging and Debugging
75
-
76
-
| Title | Article | Repository | Video |
77
-
|-------|---------|------------|--------|
78
-
| Loguru: Simple as Print, Flexible as Logging | [🔗](https://codecut.ai/simplify-your-python-logging-with-loguru/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/tree/master/productive_tools/logging_tools) | [🔗](https://youtu.be/XY_OrUoR-HU)
79
-
80
-
## LLM
81
-
82
-
| Title | Article | Repository | Video |
83
-
|-------|---------|------------|--------|
84
-
| Enforce Structured Outputs from LLMs with PydanticAI |[🔗](https://codecut.ai/enforce-structured-outputs-from-llms-with-pydanticai/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog)|[🔗](https://khuyentran1401.github.io/Data-science/llm/pydantic_ai_examples.html)|
85
-
86
-
## Speed-up Tools
87
-
88
-
| Title | Article | Repository | Video |
89
-
|-------|---------|------------|--------|
90
-
| Writing Safer PySpark Queries with Parameters |[🔗](https://codecut.ai/pyspark-sql-enhancing-reusability-with-parameterized-queries/)|[🔗](https://khuyentran1401.github.io/Data-science/data_science_tools/pandas_api_on_spark.html)|
0 commit comments