A BigQuery based system for evaluating the impact of research publications. Based on Excellence in Research for Australia (ERA).
Contact / Enquiries: coki@curtin.edu.au
For detailed documentation, see: install | configure | usage.
# run with Docker
docker run --rm -it cokicurtin/ries:latest
node . help
node . compile_all
# or install and run locally
git clone https://github.com/Curtin-Open-Knowledge-Initiative/coki-ries.git ries
cd ries
npx pnpm install
node . help
node . compile_all
Excellence in Research for Australia (ERA) is a periodic assessment that is conducted by the Australian Research Council (ARC). The assessment focuses on the activity of 42 Australian higher education providers (HEPs) across 236 ANZSRC fields of research (FoR). Performance is assessed (per HEP and FoR) by comparing research outputs to local and world benchmarks. Analysis has a citation-focus and draws from publication metadata provided by the participating HEPs.
The Curtin Open Knowledge Initiative (COKI) aggregates bibliometric and bibliographic data from publicly available sources such as Crossref, Unpaywall, OpenCitations, Microsoft Academic Graph, and OpenAlex. The resultant BigQuery database contains metadata for over 120 million research publications and forms the foundation for further analysis by the COKI team.
This software project has been developed to demonstrate how the COKI database may be used to run an ERA-like analysis. The methodology is guided by published ERA methods and makes use of journal-level metadata from the ERA 2023 Journal List. The workflows are amenable to extension, outside of the ERA scope, to include any institution (with a ROR identifier) and any research-topic vocabulary that has been assigned to research articles (eg, via machine-learning classifiers).
This codebase is free and open source (FOSS), however access to the COKI database is limited. For evaluation purposes, a subset of the COKI dataset has been extracted and made available via Google Cloud Storage. The subset is limited to metadata for approximately two million journal-articles that:
- were published in 2016,
- could be linked to at least one research institution (via ROR identifier), and
- could be linked to a journal in the ERA 2023 Journal List (via ISSN).
To compile SQL scripts, you will need a workstation that has either Docker or NodeJS. If you wish to run the queries, to build a demo database, then you will need access to your own BigQuery instance. Follow the installation instructions to continue.
- Installation - system requirements and installation (Docker, OS X or Linux)
- Configuration - description of configuration options
- Usage - command line interface instructions
- Roadmap - future development roadmap
- Workflow - detailed workflow diagram
- Method - description of the methods used to build benchmarks & indicators
Within this code repository, a README.md
file in each directory provides context. At this level (the top level):
directory | description |
---|---|
./code | Application code including libraries, SQL templates, ETL scripts and workflows. |
./data | Scratch area for working data, caches and temp files. Not under version control. |
./docs | System and method documentation. |
./setup | Installer scripts and configuration settings. |
The full COKI dataset is recompiled weekly by the Academic Observatory Workflows, running on the Academic Observatory Platform. The underlying infrastructure requires significant resourcing and we do not currently make the data resource freely available (whereas the codebases are FOSS).
For sustainable development and continuation of this project, our medium-term goal is to establish an institutional membership model. We are seeking expressions of interest from institutions that would benefit from further development of an on-demand ERA-like analytical system. The system will aim to provide value to institutions by simplifying curation of research-output metadata and facilitating exploration of alternative analytical methods. For example, reporting on how Australian HEPs perform against other institutions with a focus on Open Access publication.
We are also happy to discuss possible collaboration opportunities, analysis services or access models with interested individuals and institutions.
Once the ERA Transition Working Group releases its findings and recommendations for the future of ERA assessments, we will have greater clarity on how best to proceed.
Copyright 2022 Curtin University
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Conceptualization: Julian Tonti-Filippini and Cameron Neylon.
Data curation: Julian Tonti-Filippini.
Formal analysis: Julian Tonti-Filippini.
Funding acquisition: Cameron Neylon.
Investigation: Julian Tonti-Filippini.
Methodology: Julian Tonti-Filippini and Cameron Neylon.
Project administration: Kathryn Napier and Cameron Neylon.
Resources: Cameron Neylon.
Software: Julian Tonti-Filippini and Cameron Neylon.
Supervision: Kathryn Napier and Cameron Neylon.
Validation: Julian Tonti-Filippini.
Visualisation: Julian Tonti-Filippini.
Writing - original draft: Julian Tonti-Filippini and Cameron Neylon.
Writing - review & editing: Julian Tonti-Filippini, Kathryn Napier, and Cameron Neylon.