Skip to content

Curtin-Open-Knowledge-Initiative/what_do_we_lose_mag

Repository files navigation

What do we lose when MAG goes away?

Code, data and slides for presentation 'What do we lose when MAG goes away' (online) at CWTS, September 3, 2021 by Cameron Neylon and Bianca Kramer (see announcement).

Presentation abstract

Microsoft announced in June that the Microsoft Academic Graph (MAG) product is going to be retired at the end of 2021. MAG has become an important resource for many seeking to explore how open data might replace existing proprietary data sources as well as the exploratory development of new capabilities. It provided wide disciplinary coverage and is amongst the most comprehensive information of any freely available and openly licensed dataset.

A common strategy for many projects and services (eg Unpaywall, Lens, Semantic Scholar, the Curtin Open Knowledge Initiative and others) has been to use MAG to enrich Crossref metadata. Crossref metadata is open, but does not have the same coverage as MAG, both in terms of objects (eg books and articles without DOIs) and metadata completeness (eg proportion of objects with associated affiliations, abstracts or citations). MAG also provides specific elements that Crossref does not, including a subject classification which is useful in many contexts.

In this talk we will discuss our strategy and initial results examining what we will lose when MAG is retired. While the existing data will remain available, as it was provided as open data, it will rapidly become stale. The completeness of Crossref metadata provided by publishers is improving, but not at a rate that will replace the lost MAG coverage within the next few years. In addition, MAG data themselves are also not complete, and there are gaps not filled by either Crossref or MAG.

How big are the gaps? How can they be measured? And what strategies could be developed that will fill that gap? What risks are there for the various projects that are offering to replace MAG and those who are relying on them? What other sources of open metadata can play a role? And perhaps most importantly, what prospects are there for leveraging the plans of Crossref and others to provide a route for community sourced data to develop long term solutions for the provision of trusted, sustainable and accurate open metadata for the future?

Repository content

This repository contains the code and data for the analysis, including the queries and processing and a local copy of the derived data.

The presentation slides are rendered to https://curtin-open-knowledge-initiative.github.io/what_do_we_lose_mag

About

What do we lose when MAG goes away

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE
MIT
revealjs_LICENSE

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages