|
2 | 2 |
|
3 | 3 | ## How to Contribute to this Project
|
4 | 4 |
|
5 |
| -1. Clone this repository: |
| 5 | +### 1. Clone this repository: |
6 | 6 |
|
7 |
| - ```bash |
8 |
| - git clone https://github.com/pycontw/pycon-etl |
9 |
| - ``` |
| 7 | +```bash |
| 8 | +git clone https://github.com/pycontw/pycon-etl |
| 9 | +``` |
10 | 10 |
|
11 |
| -2. Create a new branch: |
| 11 | +### 2. Create a new branch: |
12 | 12 |
|
13 |
| - ```bash |
14 |
| - git checkout -b <branch-name> |
15 |
| - ``` |
| 13 | +Please checkout your branch from the latest master branch before doing any code change. |
16 | 14 |
|
17 |
| -3. Make your changes. |
| 15 | +```bash |
| 16 | +# Checkout to the master branch |
| 17 | +git checkout master |
18 | 18 |
|
19 |
| - > **NOTICE:** We are still using Airflow v1, so please read the official document [Apache Airflow v1.10.15 Documentation](https://airflow.apache.org/docs/apache-airflow/1.10.15/) to ensure your changes are compatible with our current version. |
| 19 | +# Ensure that's you're on the latest master branch |
| 20 | +git pull origin master |
20 | 21 |
|
21 |
| - If your task uses an external service, add the connection and variable in the Airflow UI. |
| 22 | +# Create a new branch |
| 23 | +git checkout -b <branch-name> |
| 24 | +``` |
22 | 25 |
|
23 |
| -4. Test your changes in your local environment: |
| 26 | +### 3. Make your changes. |
24 | 27 |
|
25 |
| - - Ensure the DAG file is loaded successfully. |
26 |
| - - Verify that the task runs successfully. |
27 |
| - - Confirm that your code is correctly formatted and linted. |
28 |
| - - Check that all necessary dependencies are included in `requirements.txt`. |
| 28 | +If your task uses an external service, add the connection and variable in the Airflow UI. |
29 | 29 |
|
30 |
| -5. Push your branch: |
| 30 | +### 4. Test your changes in your local environment: |
31 | 31 |
|
32 |
| - ```bash |
33 |
| - git push origin <branch-name> |
34 |
| - ``` |
| 32 | +- Ensure that the dag files are loaded successfully. |
| 33 | +- Verify that the tasks run without errors. |
| 34 | +- Confirm that your code is properly formatted and linted. See [Convention](#convention) section for more details. |
| 35 | +- Check that all necessary dependencies are included in the `pyproject.toml` file. |
| 36 | + - Airflow dependencies are managed by [uv]. |
| 37 | +- Ensure that all required documentation is provided. |
35 | 38 |
|
36 |
| -6. Create a Pull Request (PR). |
| 39 | +### 5. Push your branch: |
37 | 40 |
|
38 |
| -7. Wait for the review and merge. |
| 41 | +```bash |
| 42 | +git push origin <branch-name> |
| 43 | +``` |
39 | 44 |
|
40 |
| -8. Write any necessary documentation. |
| 45 | +### 6. Create a Pull Request (PR). |
41 | 46 |
|
42 |
| -## Release Management |
| 47 | +If additional steps are required after merging and deploying (e.g., add new connections or variables), please list them in the PR description. |
43 | 48 |
|
44 |
| -Please use [GitLab Flow](https://about.gitlab.com/topics/version-control/what-is-gitlab-flow/); otherwise, you cannot pass Docker Hub CI. |
| 49 | +### 7. Wait for the review and merge. |
45 | 50 |
|
46 |
| -## Dependency Management |
| 51 | +## Convention |
47 | 52 |
|
48 |
| -Airflow dependencies are managed by [uv]. For more information, refer to the [Airflow Installation Documentation](https://airflow.apache.org/docs/apache-airflow/1.10.15/installation.html). |
| 53 | +### Airflow Dags |
| 54 | +- Please refer to [「大數據之路:阿里巴巴大數據實戰」 讀書心得](https://medium.com/@davidtnfsh/%E5%A4%A7%E6%95%B0%E6%8D%AE%E4%B9%8B%E8%B7%AF-%E9%98%BF%E9%87%8C%E5%B7%B4%E5%B7%B4%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%AE%9E%E8%B7%B5-%E8%AE%80%E6%9B%B8%E5%BF%83%E5%BE%97-54e795c2b8c) for naming guidelines. |
| 55 | +- Table name convention: |
| 56 | +  |
49 | 57 |
|
50 |
| -## Code Convention |
| 58 | +### Code Formatting |
| 59 | +Please run `make format` to ensure your code is properly formatted before committing; otherwise, the CI will fail. |
51 | 60 |
|
52 |
| -### Airflow DAG |
| 61 | +### Commit Message |
| 62 | +It is recommended to use [Commitizen](https://commitizen-tools.github.io/commitizen/). |
53 | 63 |
|
54 |
| -- Please refer to [this article](https://medium.com/@davidtnfsh/%E5%A4%A7%E6%95%B0%E6%8D%AE%E4%B9%8B%E8%B7%AF-%E9%98%BF%E9%87%8C%E5%B7%B4%E5%B7%B4%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%AE%9E%E8%B7%B5-%E8%AE%80%E6%9B%B8%E5%BF%83%E5%BE%97-54e795c2b8c) for naming guidelines. |
| 64 | +## Release Management (CI/CD) |
| 65 | +We use [Python CI] and [Docker Image CI] to ensure our code quality meets specific standards and that Docker images can be published automatically. |
55 | 66 |
|
56 |
| - - Examples: |
57 |
| - 1. `ods/opening_crawler`: Crawlers written by @Rain. These openings can be used for the recruitment board, which was implemented by @tai271828 and @stacy. |
58 |
| - 2. `ods/survey_cake`: A manually triggered uploader that uploads questionnaires to BigQuery. The uploader should be invoked after we receive the SurveyCake questionnaire. |
| 67 | +When a pull request is created, [Python CI] checks whether the code quality is satisfactory. At the same time, we build a `cache` image using `Dockerfile` and a `test` image with `Dockerfile.test`, which are then pushed to the [GCP Artifact Registry]. |
59 | 68 |
|
60 |
| -- Table name convention: |
61 |
| -  |
| 69 | +After a pull request is merged into the `master` branch, the two image tags mentioned above are created, along with a new `staging` tag for the image generated from `Dockerfile`. |
62 | 70 |
|
63 |
| -### Format |
| 71 | +Once we verify that the `staging` image functions correctly, we merge the `master` branch into the `prod` branch through the following commands. |
64 | 72 |
|
65 |
| -Please use `make format` to format your code before committing, otherwise, the CI will fail. |
| 73 | +<!--TODO: This is not ideal. The "master" and "prod" branches should be protected and should not allow human pushes. We should create a GitHub action for this..--> |
66 | 74 |
|
67 |
| -### Commit Message |
| 75 | +```bash |
| 76 | +git checkout prod |
| 77 | +git pull origin prod |
68 | 78 |
|
69 |
| -It is recommended to use [Commitizen](https://commitizen-tools.github.io/commitizen/). |
| 79 | +git merge origin/master |
| 80 | + |
| 81 | +git pull origin prod |
| 82 | +``` |
70 | 83 |
|
71 |
| -### CI/CD |
| 84 | +This triggers the [Docker Image CI] again to update the `cache`, `test`, and `staging` images, as well as to create a `latest` image that we will later use for deploying to our production instance. See the [Deployment Guide](./DEPLOYMENT.md) for the following steps. |
72 | 85 |
|
73 |
| -Please check the [.github/workflows](.github/workflows) directory for details. |
| 86 | +```mermaid |
| 87 | +--- |
| 88 | +config: |
| 89 | + theme: 'base' |
| 90 | + gitGraph: |
| 91 | + mainBranchName: 'prod' |
| 92 | + tagLabelFontSize: '25px' |
| 93 | + branchLabelFontSize: '20px' |
| 94 | +--- |
| 95 | + gitGraph |
| 96 | + commit id:"latest features" tag:"latest" |
| 97 | + branch master |
| 98 | + commit id:"staging features" tag:"staging" |
| 99 | + checkout prod |
| 100 | + commit id:"prod config" |
| 101 | + checkout master |
| 102 | + branch feature-1 |
| 103 | + commit id: "new features" tag:"cache" tag:"test" |
| 104 | +``` |
74 | 105 |
|
75 |
| -[uv]: https://docs.astral.sh/uv/ |
| 106 | +[uv]: https://docs.astral.sh/uv/ |
| 107 | +[Python CI]: https://github.com/pycontw/pycon-etl/actions/workflows/python.yml |
| 108 | +[Docker Image CI]: https://github.com/pycontw/pycon-etl/actions/workflows/dockerimage.yml |
| 109 | +[GCP Artifact Registry]: https://cloud.google.com/artifact-registry/ |
0 commit comments