Skip to content

Add publication records from a csv file #472

@jacobthill

Description

@jacobthill

We need to scope what would be required to add publications from a csv file. We have several known cases–the Law School, Business School, and Medical School–where admins manually collect and review publication data for researchers. These are highly precise publication sets that are reviewed by the faculty members and used to meet internal reporting requirements. It would greatly enhance RIALTO as a service if we could capture this work by loading these publications in our database. It would also eliminate the need for time consuming investigations into why department x has more publications in their internal set than our RIALTO dashboard shows. These publications would likely need to be treated like approved publications in SUL-Pub. We would want to load these early in the DAG and enrich the metadata with fields from other sources (e.g. Dimensions, OpenAlex, etc.). There is a way to make airflow monitor a google drive folder. This could be a good way of capturing these publications e.g. create a folder for each campus group (school, department, lab, etc.) that monitors their own publications. They can then refresh the file whenever they like and we will pick it up on the next DAG run.

Metadata normalization:
The easiest way for us to do this would be to enforce a particular data model e.g. the csv file has to have these specific columns with these specific names or it won't work. In order to do this, it would be helpful to look at the fields that each of the three school listed above are tracking and figure our how to normalize them. We would also want to make it easy to add new fields in the future if another campus department comes along with a new field in their csv file.

Deduplication:
For now we should keep publications with a doi and throw the rest out until we figure out how to de-duplicate for publications without a doi.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions