-
Notifications
You must be signed in to change notification settings - Fork 22
[FEATURES] add additional information for a package #187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It would make sense to also query GitHub for those for which we have the data. In the Yarn website we do this frontend once the detail page has been requested. For requesting GitHub data we'll need to have some API key rotation like npms does: source. It would also make the replication slower, but that would be fine IMO, it's just the API usage limitations that I wasn't able to overcome for now |
One thing that I would also love to add is jsDelivr hits in the last month. Right now the search results are sorted by npm downloads, which is great for nodejs/backend packages, but doesn't work that well for browser/frontend packages, especially those, which recommend CDN as the primary installation option. It would be nice if we could either combine those numbers somehow, or simply had an option for sorting by cdn hits rather than npm downloads. Implementation should be as easy as using either this or this endpoint of our API. |
Is it possible to get the monthly downloads batched, like this but with a specified 100/200 packages to look this up from at the same time, so we can look them all up? |
There's ?page, so you can just do ?page=1, ?page=2, ... until you get no results. |
I won't have time for now to add this, but feel free to contribute or contact me if I can help. It indeed could be possible to augment the index like that, although the flow we currently have is
So we'd need to control which packages would be in the batch (maybe something that can be added to your API first? |
Those are valid concerns, adding the jsDelivr downloads seems "easy". As for the GitHub api requests that's a bit more tricky because of what @Haroenv said. Then you also have the freshness issue, currently rebuild the full index every week. If we provide things like number of opened issues, you might want the data to be a little fresher. That might require a bit more work to optimise the data pipeline (today it takes one day to rebuild completely the index, I am sure we can lower it down but we never investigated it too much) Still, if you already have ideas on how to do it well, please do contribute, make it faster, anything :) |
Since the data is rather compact (just package name and one number), I think we could get everything at once and have it stored in memory during the indexing process. Our API currently computes stats for all packages every time (which is why it is that slow), even though it only gives you 100 results at once, so it would make more sense if we removed that limit and you'd be able to get all numbers in one request (we're talking about 2.5 MB of data per 100k packages and currently we have ~20k packages). |
Seems possible, feel free to do a PR @MartinKolarik :) |
@Haroenv 👍 unfortunately I don't have the time right now either but hopefully later... |
@vvo @Haroenv as per the opened PRs and issues, I think it makes more sense to do it client side, indeed. I'll figure out a way to do it on https://ngx.tools. When it comes to augmenting the npm-search index with additional information from github, I'll throw some "random" ideas here (I might be wrong in some aspects):
NPM has ≈ 700k packages. So, 700k / 5k = 140 hours ≈ 6 days. It'd take 6 days to process the 700k packages using one gitub API key, with 5k packages per hour. We could enhance this by using 6 github API keys and do it in 1 day. Right? We can even throw these calls inside Cloud Function and don't pay for the infrastruction (lots of Cloud Providers have free tier—GCP offers 2 million free call / month). Alternatively, we could use GraphQL to query github, since one GraphQL call can replace multiple REST calls. A single complex GraphQL call could be the equivalent of thousands of REST requests. I'm sure we'll figure it out ^_^ |
Thanks for the nice architecture thoughts!
I did try out the GitHub GraphQL API, a bit strange at the beginning (I had to understand the actual GraphQL language), but afterwards feels super nice. Example: https://github.com/vvo/zorgs/blob/master/src/zorgs/src/queries/repositoriesWithCommits.js |
Very nice. In the same fashion, we could get the number of stars and issues like so:
Which would give us:
|
Seems useful! Would be nice to try out if we can get this merged, can be added in |
Hi guys. I've got requests for some new additions to the
npm-search
index. Let's discuss them here:Does this make sense?
The text was updated successfully, but these errors were encountered: