Skip to content

Commit 7c8039d

Browse files
authored
Update README.md
1 parent e669f0d commit 7c8039d

File tree

1 file changed

+25
-69
lines changed

1 file changed

+25
-69
lines changed

README.md

Lines changed: 25 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -17,77 +17,33 @@ This repository is a curated collection of data science articles from CodeCut, c
1717
9. [LLM](#llm)
1818
10. [Speed-up Tools](#speed-up-tools)
1919

20-
## MLOps
2120

22-
| Title | Article | Repository | Video |
23-
|-------|---------|------------|--------|
24-
| Goodbye Pip and Poetry. Why UV Might Be All You Need | [🔗](https://codecut.ai/why-uv-might-all-you-need/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | |
25-
| Stop Hard Coding in a Data Science Project – Use Configuration Files Instead | [🔗](https://codecut.ai/stop-hard-coding-in-a-data-science-project-use-configuration-files-instead/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/hydra-demo) | [🔗](https://youtu.be/jaX9zrC7y4Y)
26-
| Poetry: A Better Way to Manage Python Dependencies | [🔗](https://codecut.ai/poetry-a-better-way-to-manage-python-dependencies/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | | [🔗](https://youtu.be/-QSUyDvHQGY)
27-
| Git for Data Scientists: Learn Git through Practical Examples | [🔗](https://codecut.ai/git-deep-dive-for-data-scientists/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | | [🔗](https://youtu.be/UKCTvrJSoL0)
28-
| 4 pre-commit Plugins to Automate Code Reviewing and Formatting in Python | [🔗](https://codecut.ai/4-pre-commit-plugins-to-automate-code-reviewing-and-formatting-in-python-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/tree/master/productive_tools/precommit_examples) | [🔗](https://youtube.com/playlist?list=PLnK6m_JBRVNqskWiXLxx1QRDDng9O8Fsf)
29-
| How to Structure a Data Science Project for Maintainability | [🔗](https://codecut.ai/how-to-structure-a-data-science-project-for-readability-and-transparency-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/data-science-template/tree/dvc-poetry) | [🔗](https://youtu.be/TzvcPi3nsdw)
30-
| Build Reliable Machine Learning Pipelines with Continuous Integration | [🔗](https://codecut.ai/build-reliable-machine-learning-pipelines-with-continuous-integration-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/cicd-mlops-demo) | [🔗](https://youtu.be/rkg09nNMAhs)
31-
| Automate Machine Learning Deployment with GitHub Actions | [🔗](https://codecut.ai/automate-machine-learning-deployment-with-github-actions-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/cicd-mlops-demo) | [🔗](https://youtu.be/728M0yhI0_M)
32-
| How to Build a Fully Automated Data Drift Detection Pipeline | [🔗](https://codecut.ai/build-a-fully-automated-data-drift-detection-pipeline/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/detect-data-drift-pipeline) | [🔗](https://youtu.be/4w2ly3WuL40)
3321

34-
## Data Management Tools
22+
| Category | Title | Article | Repository | Video |
23+
| --------------------- | ------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ |
24+
| MLOps | Goodbye Pip and Poetry. Why UV Might Be All You Need | [🔗](https://codecut.ai/why-uv-might-all-you-need/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | | |
25+
| MLOps | Stop Hard Coding in a Data Science Project – Use Configuration Files Instead | [🔗](https://codecut.ai/stop-hard-coding-in-a-data-science-project-use-configuration-files-instead/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/hydra-demo) | [🔗](https://youtu.be/jaX9zrC7y4Y) |
26+
| MLOps | Poetry: A Better Way to Manage Python Dependencies | [🔗](https://codecut.ai/poetry-a-better-way-to-manage-python-dependencies/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | | [🔗](https://youtu.be/-QSUyDvHQGY) |
27+
| MLOps | Git for Data Scientists: Learn Git through Practical Examples | [🔗](https://codecut.ai/git-deep-dive-for-data-scientists/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | | [🔗](https://youtu.be/UKCTvrJSoL0) |
28+
| MLOps | 4 pre-commit Plugins to Automate Code Reviewing and Formatting in Python | [🔗](https://codecut.ai/4-pre-commit-plugins-to-automate-code-reviewing-and-formatting-in-python-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/tree/master/productive_tools/precommit_examples) | [🔗](https://youtube.com/playlist?list=PLnK6m_JBRVNqskWiXLxx1QRDDng9O8Fsf) |
29+
| MLOps | How to Structure a Data Science Project for Maintainability | [🔗](https://codecut.ai/how-to-structure-a-data-science-project-for-readability-and-transparency-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/data-science-template/tree/dvc-poetry) | [🔗](https://youtu.be/TzvcPi3nsdw) |
30+
| MLOps | Build Reliable Machine Learning Pipelines with Continuous Integration | [🔗](https://codecut.ai/build-reliable-machine-learning-pipelines-with-continuous-integration-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/cicd-mlops-demo) | [🔗](https://youtu.be/rkg09nNMAhs) |
31+
| MLOps | Automate Machine Learning Deployment with GitHub Actions | [🔗](https://codecut.ai/automate-machine-learning-deployment-with-github-actions-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/cicd-mlops-demo) | [🔗](https://youtu.be/728M0yhI0_M) |
32+
| MLOps | How to Build a Fully Automated Data Drift Detection Pipeline | [🔗](https://codecut.ai/build-a-fully-automated-data-drift-detection-pipeline/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/detect-data-drift-pipeline) | [🔗](https://youtu.be/4w2ly3WuL40) |
33+
| Data Management Tools | Version Control for Data and Models Using DVC | [🔗](https://codecut.ai/introduction-to-dvc-data-version-control-tool-for-machine-learning-projects-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/dvc-demo) | [🔗](https://youtu.be/80s_dbfiqLM) |
34+
| Data Management Tools | What is dbt (data build tool) and When should you use it? | [🔗](https://codecut.ai/build-an-efficient-data-pipeline-is-dbt-the-key/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/dbt-demo) | [🔗](https://youtu.be/mM5zWBP3G_U) |
35+
| Data Management Tools | Streamline dbt Model Development with Notebook-Style Workspace | [🔗](https://codecut.ai/dbt-mage-interactively-build-and-orchestrate-data-models/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/dbt-mage) | [🔗](https://youtu.be/vQFg1Mp60-s) |
36+
| Testing | Pytest for Data Scientists | [🔗](https://codecut.ai/pytest-for-data-scientists-3/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/tree/master/data_science_tools/pytest) | [🔗](https://www.youtube.com/playlist?list=PLnK6m_JBRVNoYEer9hBmTNwkYB3gmbOPO) |
37+
| Python Helper Tools | Write Clean Python Code Using Pipes | [🔗](https://codecut.ai/write-clean-python-code-using-pipes-3/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://deepnote.com/project/Data-science-hxlyJpi-QrKFJziQgoMSmQ/%2FData-science%2Fproductive_tools%2Fpipe.ipynb) | [🔗](https://youtu.be/K20_eZZGqsc) |
38+
| Python Helper Tools | Introducing FugueSQL — SQL for Pandas, Spark, and Dask DataFrames | [🔗](https://codecut.ai/introducing-fuguesql-sql-for-pandas-spark-and-dask-dataframes-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/blob/master/data_science_tools/fugueSQL.ipynb) | |
39+
| Python Helper Tools | Fugue and DuckDB: Fast SQL Code in Python | [🔗](https://codecut.ai/fugue-and-duckdb-fast-sql-code-in-python-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/blob/master/productive_tools/Fugue_and_Duckdb/Fugue_and_Duckdb.ipynb) | |
40+
| Feature Engineering | Polars vs. Pandas: A Fast, Multi-Core Alternative for DataFrames | [🔗](https://codecut.ai/polars-vs-pandas-a-fast-multi-core-alternative-for-dataframes/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://khuyentran1401.github.io/Data-science/data_science_tools/polars_vs_pandas.html) | |
41+
| Visualization | Top 6 Python Libraries for Visualization: Which one to Use? | [🔗](https://codecut.ai/top-6-python-libraries-for-visualization-which-one-to-use/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/tree/master/visualization/top_visualization.ipynb) | |
42+
| Python | Python Clean Code: 6 Best Practices to Make Your Python Functions More Readable | [🔗](https://codecut.ai/python-clean-code-6-best-practices-to-make-your-python-functions-more-readable-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/tree/master/python/good_functions) | [🔗](https://youtu.be/IDHD8JYBl5M) |
43+
| Logging and Debugging | Loguru: Simple as Print, Flexible as Logging | [🔗](https://codecut.ai/simplify-your-python-logging-with-loguru/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/tree/master/productive_tools/logging_tools) | [🔗](https://youtu.be/XY_OrUoR-HU) |
44+
| LLM | Enforce Structured Outputs from LLMs with PydanticAI | [🔗](https://codecut.ai/enforce-structured-outputs-from-llms-with-pydanticai/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://khuyentran1401.github.io/Data-science/llm/pydantic_ai_examples.html) | |
45+
| Speed-up Tools | Writing Safer PySpark Queries with Parameters | [🔗](https://codecut.ai/pyspark-sql-enhancing-reusability-with-parameterized-queries/) | [🔗](https://khuyentran1401.github.io/Data-science/data_science_tools/pandas_api_on_spark.html) | |
3546

36-
| Title | Article | Repository | Video |
37-
|-------|---------|------------|--------|
38-
| Version Control for Data and Models Using DVC | [🔗](https://codecut.ai/introduction-to-dvc-data-version-control-tool-for-machine-learning-projects-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/dvc-demo) | [🔗](https://youtu.be/80s_dbfiqLM)
39-
| What is dbt (data build tool) and When should you use it? | [🔗](https://codecut.ai/build-an-efficient-data-pipeline-is-dbt-the-key/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/dbt-demo) | [🔗](https://youtu.be/mM5zWBP3G_U)
40-
| Streamline dbt Model Development with Notebook-Style Workspace | [🔗](https://codecut.ai/dbt-mage-interactively-build-and-orchestrate-data-models/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/dbt-mage) | [🔗](https://youtu.be/vQFg1Mp60-s)
41-
42-
## Testing
43-
44-
| Title | Article | Repository | Video |
45-
|-------|---------|------------|--------|
46-
| Pytest for Data Scientists | [🔗](https://codecut.ai/pytest-for-data-scientists-3/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/tree/master/data_science_tools/pytest) | [🔗](https://www.youtube.com/playlist?list=PLnK6m_JBRVNoYEer9hBmTNwkYB3gmbOPO)
47-
48-
## Python Helper Tools
49-
50-
| Title | Article | Repository | Video |
51-
|-------|---------|------------|--------|
52-
| Write Clean Python Code Using Pipes | [🔗](https://codecut.ai/write-clean-python-code-using-pipes-3/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://deepnote.com/project/Data-science-hxlyJpi-QrKFJziQgoMSmQ/%2FData-science%2Fproductive_tools%2Fpipe.ipynb) | [🔗](https://youtu.be/K20_eZZGqsc)
53-
| Introducing FugueSQL — SQL for Pandas, Spark, and Dask DataFrames | [🔗](https://codecut.ai/introducing-fuguesql-sql-for-pandas-spark-and-dask-dataframes-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/blob/master/data_science_tools/fugueSQL.ipynb)
54-
| Fugue and DuckDB: Fast SQL Code in Python | [🔗](https://codecut.ai/fugue-and-duckdb-fast-sql-code-in-python-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/blob/master/productive_tools/Fugue_and_Duckdb/Fugue_and_Duckdb.ipynb)
55-
56-
## Feature Engineering
57-
58-
| Title | Article | Repository | Video |
59-
|-------|---------|------------|--------|
60-
| Polars vs. Pandas: A Fast, Multi-Core Alternative for DataFrames | [🔗](https://codecut.ai/polars-vs-pandas-a-fast-multi-core-alternative-for-dataframes/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://khuyentran1401.github.io/Data-science/data_science_tools/polars_vs_pandas.html)
61-
62-
## Visualization
63-
64-
| Title | Article | Repository | Video |
65-
|-------|---------|------------|--------|
66-
| Top 6 Python Libraries for Visualization: Which one to Use? | [🔗](https://codecut.ai/top-6-python-libraries-for-visualization-which-one-to-use/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/tree/master/visualization/top_visualization.ipynb)
67-
68-
## Python
69-
70-
| Title | Article | Repository | Video |
71-
|-------|---------|------------|--------|
72-
| Python Clean Code: 6 Best Practices to Make Your Python Functions More Readable | [🔗](https://codecut.ai/python-clean-code-6-best-practices-to-make-your-python-functions-more-readable-2/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/tree/master/python/good_functions) | [🔗](https://youtu.be/IDHD8JYBl5M)
73-
74-
## Logging and Debugging
75-
76-
| Title | Article | Repository | Video |
77-
|-------|---------|------------|--------|
78-
| Loguru: Simple as Print, Flexible as Logging | [🔗](https://codecut.ai/simplify-your-python-logging-with-loguru/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://github.com/khuyentran1401/Data-science/tree/master/productive_tools/logging_tools) | [🔗](https://youtu.be/XY_OrUoR-HU)
79-
80-
## LLM
81-
82-
| Title | Article | Repository | Video |
83-
|-------|---------|------------|--------|
84-
| Enforce Structured Outputs from LLMs with PydanticAI | [🔗](https://codecut.ai/enforce-structured-outputs-from-llms-with-pydanticai/?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https://khuyentran1401.github.io/Data-science/llm/pydantic_ai_examples.html) |
85-
86-
## Speed-up Tools
87-
88-
| Title | Article | Repository | Video |
89-
|-------|---------|------------|--------|
90-
| Writing Safer PySpark Queries with Parameters | [🔗](https://codecut.ai/pyspark-sql-enhancing-reusability-with-parameterized-queries/) | [🔗](https://khuyentran1401.github.io/Data-science/data_science_tools/pandas_api_on_spark.html) |
9147

9248
## Contributing
9349

@@ -100,4 +56,4 @@ To contribute:
10056
- Click "New issue"
10157
- Select "Article Topic Suggestion" template
10258
- Fill in the template with your article proposal
103-
2. Read our [contribution guidelines](contribution.md)
59+
2. Read our [contribution guidelines](contribution.md)

0 commit comments

Comments
 (0)