Fetch data for 24 days to stay within quota #49

hugovk · 2025-04-29T13:30:51Z

Dry runs:

❯ pypinfo --all --indent 0 --limit 15000 --days 27 --dry-run "" project
Served from cache: False
Data processed: 1.10 TiB
Data billed: 0.00 B
Estimated cost: $0.00

❯ pypinfo --all --indent 0 --limit 15000 --days 26 --dry-run "" project
Served from cache: False
Data processed: 1.06 TiB
Data billed: 0.00 B
Estimated cost: $0.00

❯ pypinfo --all --indent 0 --limit 15000 --days 25 --dry-run "" project
Served from cache: False
Data processed: 1.01 TiB
Data billed: 0.00 B
Estimated cost: $0.00

❯ pypinfo --all --indent 0 --limit 15000 --days 24 --dry-run "" project
Served from cache: False
Data processed: 984.76 GiB
Data billed: 0.00 B
Estimated cost: $0.00

reneleonhardt · 2025-06-13T12:51:21Z

@hugovk Does Data processed account for actually fetched values too or could more columns be fetched?

Would it be possible to enable a GitHub cron job to generate every month automatically?

https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#schedule

on:
  schedule:
    # * is a special character in YAML so you have to quote this string
    - cron:  '0 0 1 * *'

hugovk · 2025-06-13T13:05:13Z

@hugovk Does Data processed account for actually fetched values too or could more columns be fetched?

It should account for the actually fetched values, because it's the result of the query sent. You can see a snippet in #42 which does more or less the same thing.

Would it be possible to enable a GitHub cron job to generate every month automatically?

docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#schedule
on:
  schedule:
    # * is a special character in YAML so you have to quote this string
    - cron:  '0 0 1 * *'

Yes, but there's already a cron running on Digital Ocean (see the README for details) that's meant to fetch the data each month automatically. Unfortunately the free quota is becoming too little and I need to adjust the amount fetched.

About the 2025.06 data: that also used up too much quota and so didn't complete. I'd meant to merge #50 before the 1st June, but I was travelling. I'll have to do a manual run instead.

hugovk · 2025-06-13T13:26:32Z

About the 2025.06 data: that also used up too much quota and so didn't complete. I'd meant to merge #50 before the 1st June, but I was travelling. I'll have to do a manual run instead.

Done re: #50 (comment)

-> https://github.com/hugovk/top-pypi-packages/releases/tag/2025.06

reneleonhardt · 2025-06-13T13:45:54Z

Thank you!
Hmm it looks like 1 TB isn't enough anymore for the exploding Python ecosystem, every month one day less free quota 😅
No wonder with free-threading and AI getting adopted more and more everyday.

Maybe time to start thinking if the magic number 15000 should be reduced to be able to show a full month.
I mean the name is "top" not "top-15000"... 😉

hugovk · 2025-06-13T20:49:09Z

5k or 8k or 15k or 600k doesn't make a difference!

https://hugovk.dev/blog/2024/a-surprising-thing-about-pypis-bigquery-data/#finding-the-number-of-packages-doesnt-affect-the-cost

reneleonhardt · 2025-06-14T06:22:09Z

There are so many "big data" tools, it's sad that for counting PyPI downloads no other solution has been chosen but Big Query... now that Microsoft is helping CPython development, why don't they store the download data / metadata?

So many packages are abandoned or incompatible to Python 3.13 or missing binary wheels for some platforms or architectures, identifying those problems would be more important than counting downloads and should be provided for free to the community.

hugovk · 2025-06-14T07:31:46Z

Is the compatibility with 3.13 so bad?

https://pyreadiness.org/3.13/ shows 55% of the top 360 packages have declared compatibility by adding the 3.13 Trove classifier, but many more are nevertheless compatible but either don't use classifiers or haven't added/released yet.

Are there any in particular you're missing?

Fetch data for 24 days to stay within quota

4dcb0fe

hugovk merged commit b7d80ec into main Apr 29, 2025
3 checks passed

hugovk deleted the 24-days branch April 29, 2025 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fetch data for 24 days to stay within quota #49

Fetch data for 24 days to stay within quota #49

hugovk commented Apr 29, 2025

Uh oh!

Uh oh!

reneleonhardt commented Jun 13, 2025

Uh oh!

hugovk commented Jun 13, 2025

Uh oh!

hugovk commented Jun 13, 2025

Uh oh!

reneleonhardt commented Jun 13, 2025 •

edited

Loading

Uh oh!

hugovk commented Jun 13, 2025

Uh oh!

reneleonhardt commented Jun 14, 2025

Uh oh!

hugovk commented Jun 14, 2025

Uh oh!

Uh oh!

Uh oh!

Fetch data for 24 days to stay within quota #49

Fetch data for 24 days to stay within quota #49

Conversation

hugovk commented Apr 29, 2025

Uh oh!

Uh oh!

reneleonhardt commented Jun 13, 2025

Uh oh!

hugovk commented Jun 13, 2025

Uh oh!

hugovk commented Jun 13, 2025

Uh oh!

reneleonhardt commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hugovk commented Jun 13, 2025

Uh oh!

reneleonhardt commented Jun 14, 2025

Uh oh!

hugovk commented Jun 14, 2025

Uh oh!

Uh oh!

reneleonhardt commented Jun 13, 2025 •

edited

Loading