Skip to content

cbettinger/gedcom-5to7

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gedcom-5to7

This is an open source Java implementation of a GEDCOM 5.5.1 to GEDCOM 7.0 converter. It is a fork of java-converter released to public domain by its author Luther Tychonievich.

The aim of this fork is to publish a somewhat polished and maven-buildable version. Furthermore I will try to complete the missing functionalities (see below).

Usage

Standalone

mvn package
java -jar target/gedcom-5to7-1.0.2.jar data/gedcom551.ged
java -jar target/gedcom-5to7-1.0.2.jar data/gedcom551.ged > data/gedcom7.ged

As dependency

Add the repository and the dependency to your application's pom.xml:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns="http://maven.apache.org/POM/4.0.0"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	...
	<repositories>
		<repository>
			<id>jitpack.io</id>
			<url>https://jitpack.io</url>
		</repository>
	</repositories>
	<dependencies>
		...
		<dependency>
			<groupId>com.github.cbettinger</groupId>
			<artifactId>gedcom-5to7</artifactId>
			<version>1.0.2</version>
		</dependency>
	</dependencies>
</project>

Parse GEDCOM 5.5.1 file and write GEDCOM 7 file:

import bettinger.gedcom5to7.Converter;
import bettinger.gedcom5to7.Converter.ConvertException;

...

try (final OutputStream output = new FileOutputStream(target)) {
	final Converter converter = Converter.parse(source);
	converter.write(output);
} catch (final ConvertException e1) {
	System.err.println(e1.toString());
} catch (final IOException e2) {
	System.err.println(e2.toString());
}

Current Status

This implements all of the major pieces of a 5.5.1-to-7.0 converter. Some tests were perfomed during development, but not enough to provide confidence of bug-free status.

To-Do

Done

  • Detect character encodings, as documented in ELF Serialisation.
  • Convert to UTF-8
  • Normalize line whitespace, including stripping leading spaces
  • Remove CONC
  • Fix @ usage
  • Limit character set of cross-reference identifiers
  • Normalize case of tags
  • Covert DATE
    • replace date_phrase with PHRASE structure
    • replace calendar escapes with calendar tags
    • change BC and B.C. to BCE and remove if found in unsupported calendars
    • replace dual years with single years and PHRASEs
    • replace just-year dual years in unqualified date with BET/AND
  • Convert AGE
    • change age words to canonical forms (stillborn as 0y, child as < 8y, infant as < 1y) with PHRASEs
    • Normalize spacing in AGE payloads
    • add missing y
  • change SOUR with text payload into pointer to SOUR with NOTE
  • change OBJE with no payload to pointer to new OBJE record
  • change NOTE record or with pointer payload into SNOTE
    • use heuristic to change some pointer-NOTE to nested-NOTE instead of SNOTE
  • Convert LANG payloads to BCP 47 tags, using FHISO's mapping
  • tag renaming, including
    • EMAI, _EMAILEMAIL
    • FORM.TYPEFORM.MEDI
    • (deferred) _SDATESDATE -- _SDATE is also used as "accessed at" date for web resources by some applications so this change is not universally correct
    • _UIDUID
    • _ASSOASSO
    • _CRE, _CREATCREA
    • _DATEDATE
    • ASSO.RELAASSO.ROLE
    • other?
  • Enumerated values
    • Normalize case
    • Convert user-text to PHRASEs
  • change RFN, RIN, and AFN to EXID
  • change _FSFTID, _APID to EXID
  • Convert MEDI.FORM payloads to media types
  • Convert FONE and ROMN to TRAN and their TYPEs to BCP-47 LANGs
  • change FILE payloads into URLs
    • Windows-style \ becomes /
    • Windows diver letter C:\WINDOWS becomes file:///c:/WINDOWS
    • POSIX-stye /User/foo becomes file:///User/foo
  • remove SUBN, HEAD.FILE, HEAD.CHAR
  • update the GEDC.VERS to 7.0
  • Change any illegal tag XYZ into _EXT_XYZ
    • or to _XYZ and add a SCHMA entry for it
    • leave unchanged under extensions

Updating to new GEDCOM definitions

The folder src/main/resources contains copies of the TSV defintion files from https://github.com/FamilySearch/GEDCOM/, https://github.com/fhiso/legacy-format/ and https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry used during runtime.

These can be updated by running

javac DownloadDefinitions.java
java DownloadDefinitions

from the projects root directory.

DownloadDefinitions.java is otherwise unneeded, and should not be included in distributions of the gedcom-5to7 package.

About

GEDCOM 5.5.1 to 7.0 converter in Java

Topics

Resources

License

Stars

Watchers

Forks

Languages

  • Java 100.0%