You fill find two different sets of processors here (they each consist of a collection of functions, as you can see via the github repository):
mei_metadata_processor.py
[takes in a csv or json file and pushes values to the MEI header]mei_music_feature_processor.py
[edits the MEI body in order to correct and improve music data, including problems with slurs, musica ficta, and many features]
Note that we also provide a Jupyter Notebook you can use in a local environment or in Google Collab. It reproduces all the steps shown below.
You will need to install MEI Tools in your virtual environment in order to use them with MEI files.
Here we assume that you are doing all of this in a Jupyter Notebook, which simplifies the process of working with a folder of 'source' files (the ones you want to process) and a folder of 'output' files (the files after they have been corrected).
Here is now to install the MEI Tools:
from a terminal in your virtual environment:
pip install git+https://github.com/RichardFreedman/mei_tools
From a terminal in your virtual environment, check that the tools have been installed:
python -c "import mei_tools; print('import successful')"
or open a Jupyter Notebook, create a new cell; add the following to it, and run the cell:
import mei_tools
If there are no error messages, you are ready to go!
Next you will need to call up an instance of the processor you want. The following sections explain this in detail for each.
The processor takes in:
- A
source_folder
of MEI files to be updated (and also asks you specify anoutput_dir
where the processed files will go) - A
list of metadata dictionaries
that provide the new data. One convenient way to do this is by publishing a Google Sheet as a CSV file as we do here, then importing that sheet to Pandas and then converting it to a list of dictionaries (in which each row is a dictionary). Here is what one of our dictionary entries looks like. Thekeys
are the columns of our spreadsheet. Thevalues
are the contents of each cell for a given row.
{'CRIM_ID': 'CRIM_Model_0001',
'MEI_Name': 'CRIM_Model_0001.mei',
'Title': 'Veni speciosam',
'Mass Title': '',
'Genre': 'motet ',
'Composer_Name': 'Johannes Lupi',
'CRIM_Person_ID': 'CRIM_Person_0004',
'Composer_VIAF': 'http://viaf.org/viaf/42035469',
'Composer_BNF_ID': 'https://data.bnf.fr/ark:/12148/cb139927263',
'Piece_Date': ' before 1542',
'Source_ID': 'CRIM_Source_0003',
'Source_Short_Title': 'Musicae Cantiones',
'Source_Title': 'Chori Sacre Virginis Marie Cameracensis Magistri, Musicae Cantiones (quae vulgo motetta nuncupantur) noviter omni studio ac diligentia in lucem editae. (8, 6, 5, et 4 vocum.) Index quindecim Cantionum. Liber tertius.',
'Source_Publisher_1': 'Pierre Attaingnant',
'Publisher_1_VIAF': 'http://viaf.org/viaf/59135590',
'Publisher_1_BNF_ID': 'https://data.bnf.fr/ark:/12148/cb12230232k',
'Source_Publisher_2': '',
'Publisher_2_VIAF': '',
'Publisher_2_BNF_ID': '',
'Source_Date': '1542',
'Source_Reference': 'RISM A I, L 3089',
'Source_Location': 'Wien',
'Source_Institution': 'Ă–sterreichische Nationalbibliothek',
'Source_Shelfmark': 'SA.78.C.1/3/1-4 Mus 19',
'Editor': 'Marco Gurrieri | Bonnie Blackburn | Vincent Besson | Richard Freedman',
'Last_Revised': '2020_07_10',
'Rights_Statement': 'This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License',
'Copyright_Owner': "Centre d'Études Supérieures de la Renaissance | Haverford College | Marco Gurrieri | Bonnie Blackburn | Vincent Besson | Richard Freedman"}
The processor takes in each file in turn, then matches it against the list of dictionaries to find the one it needs.
Our first step with the MEI file itself is to rebuild the head
element. Depending on the particular pathway used to create the MEI file (Sibelius to MEI exporter, MEI Friend, Verovio Viewer, or MuseScore) the results will be quite different. Not all exporters create the head tags in the same way, although each is valid MEI.
We rebuild the MEI to include key elements:
- fileDesc (with information about what is contained here, including composer, title, editors, modern publisher, and rights statement)
- appInfo (how we created the file, with the MEI Updater)
- workList (repeating information about the composer and title of the music)
- manifestationList (the details of the original source, including title, date, location)
We now create or update each of these tags in turn, populating them with data from the matching metadata_dict, and appended to the appropriate parent element in the MEI structure. Some tags are nested within others, creating a hierarchical structure for the metadata.
This is the first step before running the processor.
# Import necessary libraries
import mei_tools
from mei_tools import MEI_Metadata_Updater
from mei_tools import MEI_Music_Feature_Processor
import glob
import os
import pandas as pd
For example:
# Load metadata CSV from Gsheet:
metadata_csv_url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vTSspBYGhjx-UJb-lIcy8Dmxjj3c1EuBqX_IWhi2aT1MvybZ_RAn8eq7gXfjzQ_NEfEq2hCZY5y-sHx/pub?output=csv"
# a dataframe from that sheet
df = pd.read_csv(metadata_csv_url).fillna('')
# a list of dictionaries from the dataframe
crim_metadata_dict_list = df.to_dict(orient='records')
For example:
mei_paths = glob.glob('MEI_IN/*')
output_folder = 'MEI_OUT'
Like this:
metadata_updater = MEI_Metadata_Updater()
- Optionally you can use
dir(metadata_updater)
to see all the available methods. In fact there is only one that interests us:apply_metadata
Now we make 'pairs' of each mei file and its corresponding metadata dictionary and store them as a list of tuples:
pairs_to_process = []
for mei_path in mei_paths:
mei_file_name = os.path.basename(mei_path)
matching_dict = next((item for item in metadata_dicts if item['MEI_Name'] == mei_file_name), None)
tup = mei_path, matching_dict
pairs_to_process.append(tup)
And finally we declare the results and run the updater, passing in the metadata dictionary list:
for mei_file_name, matching_dict in pairs_to_process:
metadata_updater.apply_metadata(mei_file_name, matching_dict, output_folder)
The mei_music_feature_processor.py
is a modular tool. That is: with any folder of MEI files you have the option to run various independent correction routines. These are described in detail below, but include:
- wrapping editorial accidentals in their correct tags
- adding voice labels to the staff definitions (for use with Verovio and CRIM Intervals)
- correction of slurs to ties (when editors mistaken encode the latter as the former)
- removal of prefatory 'incipit' staves
- removal of 'chord' elements used for ambitus in some transcriptions
- removal of empty verses (sometimes produced by conversion from other formats)
- removal of all lyrics (an extreme approach, when conversion pathways fail)
- collapsing layer elements (in which notes are mistakenly encoded as being in different voices but on the same staff)
- removal of timestamp vel attributes (the product of some conversion routines)
- removal of special editorial brackets used in The Senfl Edition files
The modules can be run as a set or singly.
It is not difficult to produce other modules for special needs.
Like this:
music_feature_processor = MEI_Music_Feature_Processor()
Optionally you can also see a list of the functions within it:
dir(music_feature_processor)
We are only interested in process_music_features
.
For example:
mei_paths = glob.glob('MEI_Updates/*')
output_folder = "MEI_Final"
Adjust the Booleans for each module as needed:
for mei_path in mei_paths:
music_feature_processor.process_music_features(mei_path,
output_folder,
remove_incipit=True,
remove_pb=True,
remove_sb=True,
remove_annotation=True,
remove_ligature_bracket=True,
remove_dir=True,
remove_variants=True,
remove_anchored_text=True,
remove_timestamp=True,
remove_chord=True,
check_for_chords=True,
remove_senfl_bracket=False,
remove_empty_verse=False,
remove_lyrics=False,
fix_elisions=True,
slur_to_tie=True,
collapse_layers=False,
correct_ficta=True,
voice_labels=True,
correct_cmme_time_signatures=False,
correct_jrp_time_signatures=False,
correct_mrests=True)
Note: We can easily add more modules based on your experience with particular MEI files.
Fix syllable elisions in the MEI files. When exported from Sibelius the elisions results in two syllable elements per note. This module finds the double syllable notes, then reformats the two syllables as a single tag for that note. The two syllables are connected with an underscore, which renders correctly in Verovio, and is valid MEI.
Replace slurs with ties in MEI files. Occasionally editors mistakenly encode ties as slurs. This module checks for these and fixes them.
Convert ficta to supplied. With the Sib_MEI export module, musica ficta is stored as text and not as a supplied element. This module fixes such errors, provided that the note to which the ficta appliesis given the color 'red' in the original Sibelius file. The function looks for accid elements associated with red notes and converts them into proper MEI supplied elements.
Remove variant elements and their contents. Files with elements include variant readings. There are some cases in which we want to preserve only the lemma (for example: analysis).This module removes the elements.
Remove chord elements. These are sometimes found in XML files, and this module removes them.
Collapse layers within staff elements. Again, some files put notes on different MEI layers. This module combines those layers.
Remove empty verse elements. In some cases we find extra verse elements that nevertheless lack content. These create problems for layout with Verovio, and so we can remove them.
Remove anchoredText elements. Anchored text elements can create strange effects when we render files with Verovio. We can remove them with this module.
Process measure numbers after removing incipit. Some early music files include incipits (prefatory staves) that include information about original clefs and noteheads. These are normally given a lable of "0" in the original file. But they can disrupt the regular measure numbers throughout the remainder of the score. This module removes the incipit and renumbers the remaining bars so that the labels and bar numbers are the same, and start with "1".
Remove timestamp and velocity attributes from notes, rests, and mRests. The tstamp.real attribute might be a problem in some contexts, and so we remove it.
Remove Senfl bracket elements. This module removes some special brackets inserted by editors of the Senfl edition.
Remove empty verse elements. Some verse elements are in fact empty, and can distort formatting with Verovio. We remove them with this module.
Remove all lyrics, including nested verse elements. Some files imported from XML or other sources have corrupted lyrics. Sometimes it is simply better to start over with text underlay in this case, and so this module removes all lyrics. The files can then be opened with MuseScore for further updates.
Add voice labels to staff definitions. It is helpful for Verovio and CRIM intervals to have voice names as 'label' attributes in our files. This module takes care of that.
For files created by CMME and JRP projects, adds the time signature attributes to the scoreDef and removes them from staffDef.
Related to the above, JRP staffDefs have meterSig elements. This function finds those and add the information to the scoreDef.
For export from CMME files removes the bracketSpan elements used for ligatures and coloration
removes dir elements
reports location of chord elements in each piece. Does not remove them (but see chord removal module)
music21 does not correctly interpret mRest values under 3/1 mensuration. This function finds those mRests and replaces them with three semibreve (whole note) rests.