Gregorovius API is the main backend application for the Gregorovius Correspondence Edition. It acts as an API layer on top of eXist-db and can be consumed as a web service by any application. Gregorovius API is based on FastAPI, delb and snakesist and implements an API configuration model proposed by Martin Fechner in 2018.
The Gregorovius Correspondence Edition is developed by the German Historical Institute in Rome in collaboration with the Berlin-Brandenburg Academy of Sciences and Humanities, with funding from the German Research Foundation and the Gerda Henkel Foundation.
- Make sure you have Poetry and Python 3.6 installed
- Install dependencies by running
$ poetry install
in the root directory - Make sure you have an eXist instance running locally on
db:8080
(eXist 4.7.1, ideally, or lower). If "db" doesn't cut it, adjust the host name as you please inapp/controller.py
- Start the server
poetry run uvicorn app:main --reload
(reload flag is optional) - Run the test suite as needed:
$ poetry run pytest
config.yml
is the manifest file which determines the content
and structure of the data served by the web service.
First, a few things to consider when setting up config.yml
:
- The file must be valid YAML, of course
- Please stick to a two-space indentation
Currently the manifest file is tailored for eXist-db and uses XPath expressions to determine what data needs to queried.
You can provide a path to the eXist collection at the root of your project, to limit the spectrum of queries to that collection, e. g.
collection: /db/projects/gregorovius/data
If you want to allow XSLT (1.0) transformations for your XML endpoints,
you can set the XSLT option to True
. XSLT is disabled by default.
xslt: True
When using the XSLT transformation feature, please note that the stylesheets processed by the app are restricted for security reasons: The body of an XSLT request must contain a stylesheet stripped of its root node.
Under the entities:
block you define the items which will become
your API endpoints. For instance, let's say we need two endpoints, letters and persons:
entities:
letters:
xpath: '//*:TEI[@*:doctype="letter"]'
persons:
xpath: '//*:TEI[@*:doctype="person_index"]//*:person'
properties:
name:
xpath:
- './persName[@type="reg"]'
Notice how the root XPath differs from the deeper one in properties.
First of all, the root XPath is a string, while the other ones are arrays.
Root XPath expressions also must have namespace prefixes while the others must not.
For now, a namespace prefix wildcard (*:
) should do the trick.
Certain predefined prefixes (tei:
, xml:
etc.) should be made
available in a future version.
If you want to specify namespaces in tag names or attribute keys in your config.yml
,
use the full XML namespace URI. For example, if you want to specifically get the value of xml:id
,
you can do it like this:
properties:
comments:
xpath: ['.//seg/note']
attrib: ['{http://www.w3.org/XML/1998/namespace}id']
multiple: True
Properties, like the ones seen in the example above, can be nested.
Each of them can have an xpath:
line pointing to the property value
by using an array of XPath expressions.
If they don't, any deeper XPath will be resolved relative to the
entity root. By default, properties yield a single value for a certain
property.
If you want multiple values as an array, you need to set the
multiple: True
option.
properties:
name:
xpath:
- './persName[@type="reg"]'
- './persName[@type="alt"]'
multiple: True
An XPath yields the text content of the children nodes. If you want to
extract an attribute value, insert an attr
option:
persons:
xpath: '//*:TEI[@*:doctype="person_index"]//*:person'
properties:
name:
xpath:
- './persName[@type="reg"]'
attr: ['key']
The property ID extracted by default is the @xml:id
of your entity node.
Other fallback attributes for the ID are not available atm, but planned
for a future version.
Note: Please refer to the eXist-db Full Text Index documentation
A basic fulltext search API endpoint can be configured using the search_index
key to set up text
parameters for the Lucene configuration. The following
configuration block
search_index:
text:
- pattern: "tei:text"
type: "qname"
inline-qname: "tei:ex"
ignore: "tei:note"
- pattern: "//tei:p"
type: "match"
inline-qname: "tei:ex"
ignore: "tei:note"
will create the following Lucene configuration in the eXist-db:
<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
<text qname="tei:text">
<inline qname="tei:ex"/>
<ignore qname="tei:note" />
</text>
<text match="//tei:p">
<inline qname="tei:ex"/>
<ignore qname="tei:note"/>
</text>
Notice that the current implementation contains defaults and hard coded values. This is an experimental feature for now. Please use with care.
- Place all the facsimile images in the
img/hd
directory (ignored) - Run
poetry run bin/convert_images
to create webp derivatives - Upload
img/webp
directory to the production server
- Open the
gesamtdatenbank.xslx
with a spreadsheet editor of your choice and save it to semicolon separated CSV