Scripts to support NYPL submission and reconciliation of metadata for the Google Books/HathiTrust project.
0.1.0
Always use virtual environment. After activating your virtual environment, invoke pip
to install the package:
$ pip install git+https://github.com/BookOps-CAT/google-books.git
To use this application, activate your virtual environment.
The package uses CLI commands to run a particular process. All scripts are launched by invoking google-books
command in your preferred command line tool.
All commands have the following pattern: google-books [OPTIONS] COMMAND [ARGS]
Use the help function to learn about all available options:
$ google-books --help
See a detailed walkthrough that includes instruction how to use the google-books tool in this Google doc.
To store all associated files for a particular shipment, create a folder in files/shipments/
using the following command:
$ google-books new-shipment [YYYYMMDD]
A Sierra export that includes Google patron account numbers and barcodes must be cleaned up before submitting it to Google. Use the following command where YYYYMMDD is the date of the shipment and folder:
$ google-books onsite-manifest [YYYYMMMDD]
This command produces NYPL_YYYYMMDD.txt
manifest file.
ReCAP staff uploads a manifest to Google Drive. Use it to select relevant barcodes and create based on them a list in Sierra that is required to prepare metadata MARCXML file. Use the following command to extract barcodes for the Create List:
$ google-books recap-manifest
The command creates google-recap-barcodes-YYYYMMDD.csv
in the shipment folder. This list is used in the Data Exchange module.
$ google-books hathi-report [FILE PATH]
Received Google's reconciliation report can be used to remove from MARCXML file, intended for HathiTrust, any records for materials that have not been scanned. Expect to receive a such reconciliation report via email about 2 months after shipment to Google. Only then remove records for any not digitized items and submit processed MARCXML file to Zephir.
$ google-books hathi-metadata-prep [MARCXML IN PATH] [GOOGLE FO REPORT PATH] [MARCXML OUT PATH]
Use MARC21 exports from Sierra to fix records that do not have an OCLC # in the control number field (001 MARC tag). Records that have OCLC identifiers in the 035 field or 991$y will have the 001 replaced with OCLC # with properly encoded 003 tag. This process deletes present 991 fields from the records.
Manipulated this way file can then be reloaded into Sierra to overwrite original records.
$ google-books oclc [MARC21 FILE PATH]
https://catalog.hathitrust.org/Record/[cid] https://babel.hathitrust.org/cgi/pt?id=nyp.[barcode] http://hdl.handle.net/2027/nyp.[barcode]
example: https://catalog.hathitrust.org/Record/100405490 https://babel.hathitrust.org/cgi/pt?id=nyp.33433105117174
http://books.google.com/books?vid=NYPL:[Barcode]
- verify 856's HathiTrust links were added to correct bibs from Nov 23
- add notes about material's poor condition to items in Sierra based on GRIN's rejections (condition 25)
- add more tests & increase test coverage
- complete documentation