Skip to content

πŸ“’πŸš¨πŸ“£ Sciscinet-v2 is a refreshed update to SciSciNet which is a large-scale, integrated dataset designed to support research in the science of science domain.

License

Notifications You must be signed in to change notification settings

Northwestern-CSSI/sciscinet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Sciscinet-v2

Sciscinet-v2 is a refreshed update to SciSciNet which is a large-scale, integrated dataset designed to support research in the science of science domain. It combines scientific publications with their network of relationships to funding sources, patents, citations, and institutional affiliations, creating a rich ecosystem for analyzing scientific productivity, impact, and innovation. Know more.

About Sciscinet-v2

The newer version Sciscinet-v2 is rebuilt using the latest snapshot from OpenAlex. Here are some informative comparisons of Sciscinet-v2 with Sciscinet-v1.

FAQ about Sciscinet-v2

1. How does Sciscinet-v2 compare to the previous version ?

2. Does Sciscinet-v2 include precomputed metrics and linkages ?

3. Do you have precomputed embeddings for papers ?

Due to the sheer size of the embeddings (1.7TB), the chunked embeddings are hosted on Google cloud storage. The easiest way to access them through the following command:
$ gsutil -m cp -r gs://sciscinet-neo/v2/embeddings/* ./path/to/your/directory/

4. I donot see sciscinet_paperdetails.parquet or sciscinet_papertitleabstract.parquet on Huggingface, where can i find them ?

sciscinet_paperdetails.parquet (117 GB) and sciscinet_papertitleabstract.parquet (92GB) are hosted on Google cloud storage and Big Query exclusively because of the size of the files. You can simply access them using the command

$ gsutil ls gs://sciscinet-neo/v2 | grep -e "sciscinet_paperdetails"
gs://sciscinet-neo/v2/sciscinet_paperdetails.parquet

$ gsutil ls gs://sciscinet-neo/v2 | grep -e "sciscinet_papertitleabstract"
gs://sciscinet-neo/v2/sciscinet_papertitleabstract.parquet

5. I donot see sciscinet_journals similar to v1, where can i find journal information ?

Since Sciscinet-v2 is built on top of OpenAlex, journal information is stored in the form of Sources. You can find the same dynamic in the file sciscinet_sources.parquet and paper-source mapping in sciscinet_papersources.parquet.

Access Sciscinet-v2

More information can be found on https://northwestern-cssi.github.io/sciscinet/.

Authors

Project contributors and Maintainers Akhil, Zihang Lin, and Yifan Qian

PI

Dashun Wang

License

MIT License

About

πŸ“’πŸš¨πŸ“£ Sciscinet-v2 is a refreshed update to SciSciNet which is a large-scale, integrated dataset designed to support research in the science of science domain.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages