Skip to content

Commit 76bc75b

Browse files
clatapielwasser
authored andcommitted
feat: adding package metrics
1 parent f792d16 commit 76bc75b

File tree

7 files changed

+184
-0
lines changed

7 files changed

+184
-0
lines changed

.github/workflows/update-pr-data.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ jobs:
4747
run: python scripts/get-sprint-data.py
4848
- name: get-review-contributors
4949
run: python scripts/get-review-contributors.py
50+
- name: get-package-data
51+
run: python scripts/get-package-data.py
5052
- name: Cache metrics
5153
uses: actions/upload-artifact@v4
5254
with:

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ node_modules/*
1616
# don't want them to accidentally push them to the repo.
1717
.env
1818

19+
.venv/
1920
/.quarto/
2021
_site/*
2122
_output/*

_data/package_data.csv

Lines changed: 46 additions & 0 deletions
Large diffs are not rendered by default.

contributors/package.qmd

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
---
2+
title: pyOpenSci Package Metrics Over Time
3+
jupyter: python3
4+
execute:
5+
echo: false
6+
---
7+
8+
This data provides metrics accross all the PyOpenSci packages.
9+
10+
```{python}
11+
#| echo: false
12+
#|
13+
import ast
14+
import warnings
15+
from itables import show
16+
from pathlib import Path
17+
18+
import altair as alt
19+
import pandas as pd
20+
21+
# This is a local module that stores the plot theme
22+
from pyosmetrics.plot_theme import load_poppins_font, register_and_enable_poppins_theme
23+
24+
pd.options.mode.chained_assignment = None
25+
pd.options.future.infer_string = True
26+
27+
# Suppress all warnings
28+
warnings.filterwarnings("ignore")
29+
30+
# Load the & register Poppins theme
31+
load_poppins_font()
32+
register_and_enable_poppins_theme()
33+
```
34+
35+
```{python}
36+
# Get the current notebook directory
37+
package_data_path = Path.cwd().parents[0] / "_data" / "package_data.csv"
38+
39+
# Read the DataFrame from the CSV file
40+
package_df = pd.read_csv(package_data_path)
41+
```
42+
43+
### Forks count per repositories
44+
45+
```{python}
46+
# Parse the "gh_meta" column back into dictionaries
47+
package_df['gh_meta'] = package_df['gh_meta'].apply(
48+
lambda x: ast.literal_eval(x) if isinstance(x, str) else x
49+
)
50+
51+
# Extract "forks_count" value from the 'gh_meta' column
52+
package_df['forks_count'] = package_df['gh_meta'].apply(
53+
lambda x: x.get('forks_count') if isinstance(x, dict) else None
54+
)
55+
```
56+
57+
58+
```{python}
59+
# Render a graph plot of the forks count
60+
chart = (
61+
alt.Chart(package_df).mark_bar()
62+
.encode(
63+
x=alt.X('package_name', sort='-y')
64+
.title('Package name')
65+
.axis(labelAngle=45),
66+
y=alt.Y('forks_count:Q')
67+
.title('Forks Count'),
68+
)
69+
.properties(title="Forks Count per Repository")
70+
.configure_legend(
71+
orient='top',
72+
titleAnchor='middle',
73+
direction='horizontal',
74+
labelFontSize=5,
75+
)
76+
.interactive()
77+
)
78+
79+
chart
80+
```
81+
82+
Find bellow the detailed count for each PyOpenSci repository.
83+
84+
```{python}
85+
# Create an itable to display the DataFrame
86+
from itables import show
87+
88+
# Display the results as an interactive table
89+
show(package_df[['package_name', 'forks_count']], max_rows=10)
90+
```

index.qmd

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ If you are looking to contribute, checkout our [README file](https://www.github.
2424

2525
* [Contributor Data](/contributors/contributors.html)
2626
* [Sprint Contributor Data](/contributors/sprints.qmd)
27+
* [Package Data](/contributors/package.qmd)
2728
:::
2829

2930

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@ pyarrow
77
itables # interactive tables!
88
tqdm
99
nox
10+
pygithub # for github API

scripts/get-package-data.py

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
import os
2+
3+
import github
4+
import pandas as pd
5+
import yaml
6+
from pathlib import Path
7+
8+
ACCESS_TOKEN = os.getenv("GITHUB_TOKEN")
9+
gh = github.Github(ACCESS_TOKEN)
10+
11+
def get_package_data():
12+
"""
13+
Get package data from GitHub repository.
14+
15+
Returns
16+
-------
17+
dict
18+
Dictionary containing package data.
19+
"""
20+
21+
# Get the repository
22+
repo = gh.get_repo("pyOpenSci/pyopensci.github.io")
23+
24+
# Get the ``_data/packages.yml`` file
25+
package_data = repo.get_contents("_data/packages.yml")
26+
package_data = package_data.decoded_content.decode("utf-8")
27+
28+
# Load the YAML content
29+
package_data = yaml.safe_load(package_data)
30+
31+
# Convert the dictionary to a DataFrame
32+
df = pd.DataFrame.from_dict(package_data)
33+
34+
return df
35+
36+
if __name__ == "__main__":
37+
package_df = get_package_data()
38+
39+
dir_path = Path("_data")
40+
file_path = dir_path / "package_data.csv"
41+
42+
dir_path.mkdir(parents=True, exist_ok=True)
43+
package_df.to_csv(file_path, index=False)

0 commit comments

Comments
 (0)