Welcome to Data Mavericks – Group 13 of the MIT Emerging Talent Certificate Program.
This repository is our shared workspace for tackling data-science challenges: gathering and cleaning datasets, experimenting with analytical methods, and turning results into clear insights. Our specific research angle is still being finalized, but we expect to dive into finance-focused datasets and problems.
- Project Overview
- How to Contribute
- Project Goals
- Team Composition
- Repository Structure
- Setup and Usage
- Roles and Responsibilities
- Task Breakdown
- Definition of Done
- Tools and Technologies
- Milestones
- License
This repository serves as the hub for our group collaboration, where we:
- Collaborate on data-science projects: Explore real-world datasets together to sharpen our analytical and coding skills.
- Learn from each other: Share knowledge, resources, and tips to help everyone grow.
- Code & notebook reviews: Provide constructive feedback to improve quality, reproducibility, and best practices.
- Tell the story with data: Turn findings into clear visualizations and concise write-ups.
- Build community: Strengthen our team spirit as we progress through the MIT Emerging Talent Program.
We encourage every team member to jump in. You can contribute by:
- Picking up tasks – Check the Task Breakdown for your assignments or claim an open issue.
- Setting up locally – Follow Setup and Usage to install dependencies and run tests.
- Working in a feature branch – Name it
<firstname>/<short_description_snake_case>
(e.g.sukhrob/data_cleaning_pipeline
). - Adding code or notebooks – Keep notebooks reproducible (clean outputs, random seeds fixed) and adhere to our lint rules.
- Submitting a pull request – Link any related issue, ensure CI passes, and confirm your work meets the Definition of Done.
- Reviewing peers’ PRs – Leave constructive comments, suggest improvements, and approve when ready.
- Opening issues or discussions – Use issue templates for bugs, ideas, or dataset proposals so we can track them transparently.
- Master professional GitHub workflows (branches, PRs, reviews, CI checks) for reproducible analytics.
- Build teamwork and collaboration skills by pairing on notebooks, code, and documentation.
- Explore the full data-science lifecycle—data collection, cleaning, modelling, and validation—on finance-related datasets.
- Apply advanced Python libraries (pandas, NumPy, scikit-learn, matplotlib/plotly, pytest) to create robust analysis pipelines, rich visualizations, and a comprehensive unit-test suite.
- Communicate insights clearly through well-designed visuals and concise, audience-appropriate write-ups.
Name | GitHub | Time Zone |
---|---|---|
Ahmed Elhassan | Goutbi | GMT +4 |
Anass Ziadah | ziadahanass | GMT +3 |
Clement Mugisha | Bikaze | GMT +2 |
Emre Biyik | emrebiyik | GMT +2 |
Mustafa Mangal | Mustafa-Mangal | GMT +2 |
Sukhrob Muborakshoev | suhrobmuboraksho | GMT -7 |
Click to expand/collapse the repository structure
.
│ .gitignore
│ .ls-lint.yml
│ .markdownlint.yml
│ CONTRIBUTING.md
│ LICENSE
│ README.md
│ guide.md
│
├───.github
│ ├───ISSUE_TEMPLATE
│ └───workflows
├───.vscode
│
├───0_domain_study
├───1_datasets
├───2_data_preparation
├───3_data_exploration
├───4_data_analysis
├───5_communication_strategy
├───6_final_presentation
│
├───collaboration
└───notes
- Python 3.9 +
- Git
- A code editor (VS Code, PyCharm, etc.)
git clone https://github.com/MIT-Emerging-Talent/ET6-CDSP-group-13-repo.git
cd ET6-CDSP-group-13-repo
python -m venv .venv
# macOS / Linux
source .venv/bin/activate
# Windows (PowerShell)
.venv\Scripts\Activate.ps1
pip install -r requirements.txt # add this file when you freeze libs
pre-commit install # optional: auto-lint on every commit
pytest -q
# Example: pull data versioned with DVC or Git LFS
dvc pull # or: git lfs pull
jupyter lab
Tip: keep notebooks in a dedicated
notebooks/
folder to minimise merge conflicts.
- Acquire and prepare data – source datasets, clean them, and document provenance.
- Analyze and model – explore patterns, build statistical / ML models, and validate results.
- Create artifacts – code modules, notebooks, visualizations, and concise reports.
- Review and improve – give constructive feedback on peers’ pull requests and notebooks.
- Maintain project hygiene – follow branch naming, pass CI checks, and update documentation.
- Ahmed Elhassan
- Anass Ziadah
- Clement Mugisha
- Emre Biyik
- Mustafa Mangal
- Sukhrob Muborakshoev
Goal: Each team member drafts one finance-data-science project idea to discuss in this week’s meeting.
Include ⬇️
- Problem statement (1–2 sentences)
- Potential dataset(s) / data source
- Expected value or insight
- Sketch of analysis approach (very high-level)
Team Member | Deliverable | Due (before meeting) |
---|---|---|
Ahmed Elhassan | Project-idea brief (≤1 page) | 14 Jun 2025 |
Anass Ziadah | Project-idea brief (≤1 page) | 14 Jun 2025 |
Clement Mugisha | Project-idea brief (≤1 page) | 14 Jun 2025 |
Emre Biyik | Project-idea brief (≤1 page) | 14 Jun 2025 |
Mustafa Mangal | Project-idea brief (≤1 page) | 14 Jun 2025 |
Sukhrob Muborakshoev | Project-idea brief (≤1 page) | 14 Jun 2025 |
Tip: Save your idea as
0_domain_study/ideas/<firstname>/idea.md
(or a short notebook) in a feature branch namedfirstname/project_idea
.
A task is complete when:
- Code modules and notebooks pass automated tests, lint checks, and receive approval in code review.
- Collaboration documents are reviewed and approved.
- CI/CD pipelines run successfully.
- Version Control: Git + GitHub
- Project Management: GitHub Issues & Project boards
- Languages / Libraries
- Python 3.9+
- pandas · NumPy · scikit-learn
- matplotlib / Plotly (visualization)
- Environment: Jupyter Lab · virtualenv
- Testing: pytest
- CI/CD: GitHub Actions (lint, unit tests)
- Documentation: Markdown (MkDocs optional for future docs site)
Phase | Date Range (2025) |
---|---|
Cross-Cultural Collaboration | May 27 – June 2 |
Problem Identification | June 3 – June 16 |
Data Collection | June 17 – June 30 |
Data Analysis | July 1 – July 21 |
Communicating Results | July 22 – August 11 |
Final Presentation | August 12 – August 24 |
This project is licensed under the MIT License.