Skip to content

iliedorobat/enriching-cultural-heritage-metadata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

eCHO: Enriching The Digital Representation of Cultural Heritage Objects

Requirements

JDK 11+ or OpenJDK 11+
Maven 3.x

Setup

  1. Download and install JDK 11 or OpenJDK 11 (or newer versions)
  2. Download and install Maven 3.x
  3. Clone the repository:
git clone https://github.com/iliedorobat/enriching-cultural-heritage-metadata.git
  1. The main language is set to Romanian language by default. If the dataset uses a different language, you must modify the LANG_MAIN environment variable located in EnvConstants.java.
  2. Open the Terminal/Command Prompt and navigate to the root directory (enriching-cultural-heritage-metadata directory).
  3. Generate the library:
mvn validate && mvn clean package

Collect museum datasets:

java -jar target/eCHO-1.4-jar-with-dependencies.jar --museumsCollector

Translate XML files to EDM:

  1. CIMEC to EDM:
## main command
java -jar target/eCHO-1.4-jar-with-dependencies.jar --dataType=CIMEC
## quick demo
java -jar target/eCHO-1.4-jar-with-dependencies.jar --demo --dataType=CIMEC
  1. DSPACE to EDM:
## main command
java -jar target/eCHO-1.4-jar-with-dependencies.jar --dataType=DSPACE
## quick demo
java -jar target/eCHO-1.4-jar-with-dependencies.jar --demo --dataType=DSPACE
  1. LIDO to EDM:
## main command
java -jar target/eCHO-1.4-jar-with-dependencies.jar --dataType=LIDO
## quick demo
java -jar target/eCHO-1.4-jar-with-dependencies.jar --demo --dataType=LIDO

Normalize time expression

java -jar target/eCHO-1.4-jar-with-dependencies.jar --expression="1/2 sec. 3 a. chr - sec. 2 p. chr."

Input Datasets

CIMEC datasets:

  • Datasets need to be added in the files/input/cimec directory.
  • The existing datasets

DSPACE datasets:

  • The datasets need to be added in the files/input/dspace directory.
  • Snapshot of dspace storage level:
main_directory/
    item1/
        dublin_core.xml
        contents
        contentFile1.ext
        contentFile2.ext
        ...
        contentFileM.ext
    item2/
    ...
    itemN/

LIDO datasets:

Output Datasets

CIMEC datasets: The output datasets are located to the files/output/cimec2edm directory.

  • *.rdf contains the EDM prepared datasets;

DSPACE datasets: The output datasets are located to the files/output/dspace2edm directory.

  • *.rdf contains the EDM prepared datasets;

LIDO datasets: The output datasets are located to the files/output/lido2edm directory.

  • *.rdf contains the EDM prepared datasets;
  • timespan_all.txt contains all identified time expressions;
  • timespan_unique.txt contains unique identified time expressions;
  • timespan_all-analysis.csv contains the pair of input time expressions - normalized centuries;
  • timespan_unique-analysis.csv contains the unique pair of input time expressions - normalized centuries;
  • properties.csv contains the pair of parent property - child property.

Online Datasets

Publications:

ECAI 2021: The Power of Regular Expressions in Recognizing Dates and Epochs

@InProceedings{9515139,
    author="Dorobat, Ilie Cristian and Posea, Vlad",
    booktitle="2021 13th International Conference on Electronics, Computers and Artificial Intelligence (ECAI)",
    title="The Power of Regular Expressions in Recognizing Dates and Epochs",
    year="2021",
    pages="1-3",
    doi="10.1109/ECAI52376.2021.9515139"
}

EuroMed 2020: The Usability of Romanian Open Data in the Development of Tourist Applications

@InProceedings{10.1007/978-3-030-73043-7_51,
    author="Dorobat, Ilie Cristian and Posea, Vlad",
    title="The Usability of Romanian Open Data in the Development of Tourist Applications",
    booktitle="Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection",
    year="2021",
    publisher="Springer International Publishing",
    pages="596-602",
    isbn="978-3-030-73043-7"
    doi="10.1007/978-3-030-73043-7_51"
}

EMCIS 2020: Raising the Interoperability of Cultural Datasets: The Romanian Cultural Heritage Case Study

@InProceedings{10.1007/978-3-030-63396-7_3,
    author="Dorobat, Ilie Cristian and Posea, Vlad",
    title="Raising the Interoperability of Cultural Datasets: The Romanian Cultural Heritage Case Study",
    booktitle="Information Systems",
    year="2020",
    publisher="Springer International Publishing",
    pages="35-48",
    isbn="978-3-030-63396-7"
    doi="10.1007/978-3-030-63396-7_3"
}

ECAI 2020: Evolving the DSpace Storage into Linked Data

@InProceedings{9223189,
    author="Dorobat, Ilie Cristian and Posea, Vlad",
    booktitle="2020 12th International Conference on Electronics, Computers and Artificial Intelligence (ECAI)",
    title="Evolving the DSpace Storage into Linked Data",
    year="2020",
    pages="1-5",
    doi="10.1109/ECAI50035.2020.9223189"
}

TPDL 2019: Enriching the Cultural Heritage Metadata Using Historical Events: A Graph-Based Representation

@InProceedings{10.1007/978-3-030-30760-8_30,
    author="Dorobat, Ilie Cristian and Posea, Vlad",
    title="Enriching the Cultural Heritage Metadata Using Historical Events: A Graph-Based Representation",
    booktitle="Digital Libraries for Open Knowledge",
    year="2019",
    publisher="Springer International Publishing",
    pages="344-347",
    isbn="978-3-030-30760-8"
    doi="10.1007/978-3-030-30760-8_30"
}

About

eCHO: Enriching The Digital Representation of Cultural Heritage Objects

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages