gtfs-rt-feed

Continuously matches realtime transit data in the VDV-454 structure against a GTFS Schedule dataset and generates GTFS Realtime (GTFS-RT) data.

Tip

If you're just looking for VBB's publicly deployed GTFS-RT feed:

Tip

Although gtfs-rt-feed can be used standalone, it is intended to be used in tandem with vdv-453-nats-adapter – which pulls the input VDV-454 data from a VDV-453/-454 API – and nats-consuming-gtfs-rt-server – which combines the DIFFERENTIAL-mode GTFS-RT data sent by gtfs-rt-feed into a single non-differential feed and serves it via HTTP.

For more details about the architecture gtfs-rt-feed has been designed for, refer to the VBB deployment's readme.

It uses the PostGIS GTFS importer to import the GTFS Schedule data into a new PostgreSQL database whenever it has changed.

How matching works

This service reads both VDV-454 REF-AUS SollFahrts and VDV-454 AUS IstFahrts from a NATS message queue (in JSON instead of XML):

// REF-AUS SollFahrt
// To be more readable, this example only contains essential fields. In practice, there are more.
{
	"LinienID": "M77",
	"UmlaufID": "1234",
	"FahrtID": {
		"FahrtBezeichner": "9325_877_8_2_19_1_1806#BVG",
		"Betriebstag": "2024-09-20",
	},
	"SollHalts": [
		{
			"HaltID": "900073281",
			"Abfahrtszeit": "2024-09-20T12:41:00Z",
		},
		{
			"HaltID": "900073236",
			"Ankunftszeit": "2024-09-20T12:43:00Z",
			"Abfahrtszeit": "2024-09-20T12:45:00Z",
		},
		// Usually there are more IstHalts, but the IstFahrt may not be complete.
	],
}

// AUS IstFahrt
// Again, non-essential fields are omitted.
{
	"LinienID": "M77",
	"LinienText": "M77",
	"FahrtID": {
		"FahrtBezeichner": "9325_877_8_2_19_1_1806#BVG",
		"Betriebstag": "2024-09-20",
	},
	"Komplettfahrt": null,
	"IstHalts": [
		{
			"HaltID": "900073236",
			"Ankunftszeit": "2024-09-20T12:43:00Z",
			"Abfahrtszeit": "2024-09-20T12:45:00Z",
			"IstAnkunftPrognose": "2024-09-20T13:46:00+01:00", // 3 minutes delay
			"IstAbfahrtPrognose": "2024-09-20T13:47:00+01:00", // 2 minutes delay
		},
		// Sometimes there are more IstHalts, but the IstFahrt may also contain just one IstHalt.
	],
}

For each trip "instance" (e.g. the M77 above, starting at 2024-09-20T12:41:00Z), there may be

a REF-AUS SollFahrt, delineating the scheduled (read: as intended by the transport company's medium-term planning, i.e. taking into account construction work, strikes, etc.) sequence of stops. – These messages (there can by multiple per trip "instance") are typically sent at the beginning of the schedule day early in the morning.
0 or more AUS IstFahrts with all IstHalts, as indicated by their Komplettfahrt=true flag, delineating the prognosed sequence of stops. – These messages are typically sent right before the first departure of and during a trip "instance". Besides providing prognosed arrival/departure times, they also express cancelled and added stops; They are considered exhaustive descriptions of the trip "instance". Only the most recent is kept for each trip "instance".
0 or more partial AUS IstFahrts, as indicated by the lack of Komplettfahrt=true, expressing realtime changes just to those stops that they contain IstHalts for. For each stop of each trip "instance", the most recent is kept.

For a single trip "instance", both the number of (each) kind of message as well as their order is unknown. This is why gtfs-rt-feed

persists all of these messages in a key-value store (Redis), so that,
whenever a new message is received, it can query all previous ones concerning the same trip "instance", and
merge them into a single new IstFahrt structure, "layering" the realtime data from the received AUS IstFahrts on top of the schedule data from the received REF-AUS SollFahrt.

After merging, the IstFahrt is transformed into a GTFS-RT TripUpdate, so that subsequent code must only deal with GTFS-RT concepts.

// Again, this example has been shortened for readability.
{
	"trip": {},
	"stop_time_update": [
		{
			"stop_id": "900073281",
			"departure": {
				"time": 1726836420,
				"delay": 300,
			},
		},
		{
			"stop_id": "900073236",
			"arrival": {
				"time": 1726836360,
				"delay": 180,
			},
			"departure": {
				"time": 1726836420,
				"delay": 120,
			},
		},
	],
	// not part of the GTFS Realtime spec, we just use it for matching and/or debug-logging
	[kRouteShortName]: "M77",
}

Within the imported GTFS Schedule data, gtfs-rt-feed then tries to find trip "instances" that

have the same route_short_name ("M77"),
for at least two IstHalts, stop at (roughly) the same scheduled time (2024-09-20T12:41:00Z) at (roughly) the same stop (900073281).

If there is exactly one such GTFS Schedule trip "instance", we call it a match. If there are 2 trip "instances", we consider the the match ambiguous and not specific enough, so we stop processing the IstFahrt.

The GTFS Schedule trip "instance" is then formatted as a GTFS-RT TripUpdate (it contains no realtime data). Then the schedule TripUpdate and the matched realtime TripUpdate get merged into a single new TripUpdate.

// Again, this example has been shortened for readability.
{
	"trip": {
		"trip_id": "1234567",
		"route_id": "17462_700",
	},
	"stop_time_update": [
		{
			"stop_id": "de:11000:900073281",
			// Note that `arrival` has been filled in from schedule data.
			"arrival": {
				"time": 1726836060,
			},
			"departure": {
				"time": 1726836420,
				"delay": 300,
			},
		},
		{
			"stop_id": "de:11000:900073236",
			"arrival": {
				"time": 1726836360,
				"delay": 180,
			},
			"departure": {
				"time": 1726836420,
				"delay": 120,
			},
		},
	],
	// not part of the GTFS Realtime spec, we just use it for matching and/or debug-logging
	[kRouteShortName]: "M77",
}

This whole process, which we call matching, is done continuously for each VDV-454 SollFahrt/IstFahrt received from NATS.

Installation

There is a Docker image available:

# pull the Docker image …
docker pull ghcr.io/opendatavbb/gtfs-rt-feed

# … or install everything manually (you will need Node.js & npm).
git clone https://github.com/OpenDataVBB/gtfs-rt-feed.git gtfs-rt-feed
cd gtfs-rt-feed
npm install --omit dev
# install submodules' dependencies
git submodule update --checkout
cd postgis-gtfs-importer && npm install --omit dev

Getting Started

Important

Although gtfs-rt-feed is intended to be data-source-agnostic, just following the GTFS Schedule and GTFS-RT specs, it currently has some hard-coded assumptions specific to the VBB deployment it has been developed for. Please create an Issue if you want to use gtfs-rt-feed in another setting.

Prerequisites

gtfs-rt-feed needs access to the following services to work:

a NATS message queue with JetStream enabled
a PostgreSQL database server, with the permission to dynamically create new databases (see postgis-gtfs-importer's readme)
a Redis in-memory cache, at least 8.0.0 is required (Valkey currently doesn't support the HSETEX command)

configure access to PostgreSQL

gtfs-rt-feed uses pg to connect to PostgreSQL; For details about supported environment variables and their defaults, refer to pg's docs.

To make sure that the connection works, use psql from the same context (same permissions, same container if applicable, etc.).

configure access to NATS

gtfs-rt-feed uses nats to connect to NATS. You can use the following environment variables to configure access:

$NATS_SERVERS – list of NATS servers (e.g. localhost:4222), separated by ,
$NATS_USER & $NATS_PASSWORD – if you need authentication
$NATS_CLIENT_NAME – the connection name

By default, gtfs-rt-feed will connect as gtfs-rt-$MAJOR_VERSION to localhost:4222 without authentication.

create NATS stream & consumer

We also need to create two NATS JetStream streams called REF_AUS_SOLLFAHRT_2 and AUS_ISTFAHRT_2 that gtfs-rt-feed will read (unmatched) VDV-454 REF-AUS SollFahrt and AUS IstFahrt messages from, respectively. This can be done using the NATS CLI:

nats stream add \
	# omit this if you want to configure more details
	--defaults \
	# collect all messages published to these subjects
	--subjects='ref_aus.sollfahrt.>' \
	# acknowledge publishes
	--ack \
	# with limited storage, discard the oldest limits first
	--retention=limits --discard=old \
	--description='VDV-454 REF-AUS SollFahrt messages' \
	# name of the stream
	REF_AUS_SOLLFAHRT_2
nats stream add \
	--defaults \
	--subjects='aus.istfahrt.>' \
	--ack \
	--retention=limits --discard=old \
	--description='VDV-454 AUS IstFahrt messages' \
	AUS_ISTFAHRT_2

On the both streams, we create one durable consumer each called gtfs-rt-feed:

nats consumer add \
	# omit this if you want to configure more details
	--defaults \
	# create a pull-based consumer (refer to the NATS JetStream docs)
	--pull \
	# let gtfs-rt-feed explicitly acknowledge all received messages
	--ack=explicit \
	# let the newly created consumer start with the latest messages in AUS_ISTFAHRT_2 (not all)
	--deliver=new \
	# send gtfs-rt-feed at most 200 messages at once
	--max-pending=200 \
	# when & how often to re-deliver a message that hasn't been acknowledged (usually because it couldn't be processed)
	--max-deliver=3 \
	--backoff=linear \
	--backoff-steps=2 \
	--backoff-min=15s \
	--backoff-max=2m \
	--description 'OpenDataVBB/gtfs-rt-feed' \
	# name of the stream
	REF_AUS_SOLLFAHRT_2 \
	# name of the consumer
	gtfs-rt-feed
nats consumer add \
	--defaults \
	--pull \
	--ack=explicit \
	--deliver=new \
	--max-pending=200 \
	--max-deliver=3 \
	--backoff=linear \
	--backoff-steps=2 \
	--backoff-min=15s \
	--backoff-max=2m \
	--description 'OpenDataVBB/gtfs-rt-feed' \
	AUS_ISTFAHRT_2 \
	gtfs-rt-feed

Next, again using the NATS CLI, we'll create a stream called GTFS_RT_2 that the gtfs-rt-feed service will write (matched) GTFS-RT messages into:

nats stream add \
	# omit this if you want to configure more details
	--defaults \
	# collect all messages published to these subjects
	--subjects='gtfsrt.>' \
	# acknowledge publishes
	--ack \
	# with limited storage, discard the oldest limits first
	--retention=limits --discard=old \
	--description='GTFS-RT messages' \
	# name of the stream
	GTFS_RT_2

configure access to Redis

gtfs-rt-feed uses ioredis to connect to Redis; For details about supported environment variables and their defaults, refer to its docs.

Tip

You should allow Redis to use at least a few hundred MB of memory. With the VBB deployment, we limit it to 2GB.

import GTFS Schedule data

Make sure your GTFS Schedule dataset is available via HTTP without authentication. Configure the URL using $GTFS_DOWNLOAD_URL. Optionally, you can configure the User-Agent being used for downloading by setting $GTFS_DOWNLOAD_USER_AGENT.

The GTFS import script will

download the GTFS dataset;
import it into a separate database called gtfs_$timestamp_$gtfs_hash (each revision gets its own database);
keep track of the latest successfully imported database's name in a meta "bookkeeping" database ($PGDATABASE by default).

Refer to postgis-gtfs-importer's docs for details about why this is done and how it works.

Optionally, you can

activate gtfstidy-ing before import using GTFSTIDY_BEFORE_IMPORT=true;
postprocess the imported GTFS dataset using custom SQL scripts by putting them in $PWD/gtfs-postprocessing.d.

Refer to the import script for details about how to customize the GTFS Schedule import.

export GTFS_DOWNLOAD_URL='…'
# Run import using Docker …
./import.sh --docker
# … or run import using ./postgis-gtfs-importer
./import.sh

Once the import has finished, you must set $PGDATABASE to the name of the newly created database.

export PGDATABASE="$(psql -q --csv -t -c 'SELECT db_name FROM latest_import')"

Note

If you're running gtfs-rt-feed in a continuous (service-like) fashion, you'll want to run the GTFS Schedule import regularly, e.g. once per day. postgis-gtfs-importer won't import again if the dataset hasn't changed.

Because it highly depends on your deployment strategy and preferences on how to schedule the import – and how to modify $PGDATABASE for the gtfs-rt-feed process afterwards –, this repo doesn't contain any tool for that.

As an example, VBB's deployment uses a systemd timer to schedule the import and a systemd service drop-in file to set $PGDATABASE.

run `gtfs-rt-feed`

# Run using Docker …
# (In production, use the container deployment tool of your choice.)
docker run --rm -it \
	-e PGDATABASE \
	# note: pass through other environment variables here
	ghcr.io/opendatavbb/gtfs-rt-feed

# … or manually.
# (During development, pipe the logs through `./node_modules/.bin/pino-pretty`.)
node index.js

todo: $LOG_LEVEL todo: $LOG_LEVEL_MATCHING todo: $LOG_LEVEL_FORMATTING todo: $LOG_LEVEL_STATION_WEIGHT todo: $METRICS_SERVER_PORT todo: $MATCHING_CONCURRENCY todo: $MATCH_GTFS_RT_TO_GTFS_CACHING todo: $MATCHING_CONSUMER_NAME todo: $MATCHING_PUBLISH_UNMATCHED_TRIPUPDATES todo: $PG_POOL_SIZE

Alternative: Docker Compose setup

The example docker-compose.yml starts up a complete set of containers (vbb-gtfs-rt-server and all of its dependencies: PostgreSQL & NATS).

Warning

The Docker Compose setup is only intended as a quick demo on how to run gtfs-rt-feed and its dependency services.

Be sure to set POSTGRES_PASSWORD, either via a .env file or an environment variable.

POSTGRES_PASSWORD=my_secret_password docker-compose up

Operating `gtfs-rt-feed`

Logs

gtfs-rt-feed writes pino-formatted log messages to stdout, so you can use pino-compatible tools to process them.

Monitoring

gtfs-rt-feed exposes Prometheus-compatible metrics via HTTP. By default, the metrics server will listen on a random port. You can configure a permanent port using $METRICS_SERVER_PORT.

The following kinds of metrics will be exported:

domain-specific metrics, e.g.
- number of successful/failed/errored matchings
- DB/cache query timings
technical details about the Node.js process, e.g. the current state of garbage collection

Refer to the Grafana dashboard in VBB's deployment for an example how to visualize gtfs-rt-feed's metrics.

License

This project is ISC-licensed.

Note that PostGIS GTFS importer, one of the service's dependencies, is EUPL-licensed.

Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
.github		.github
docs		docs
gtfs-postprocessing.d		gtfs-postprocessing.d
lib		lib
postgis-gtfs-importer @ eceb5b0		postgis-gtfs-importer @ eceb5b0
test		test
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
docker-compose.yml		docker-compose.yml
eslint.config.mjs		eslint.config.mjs
import.sh		import.sh
index.js		index.js
license.md		license.md
package.json		package.json
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

gtfs-rt-feed

How matching works

Installation

Getting Started

Prerequisites

configure access to PostgreSQL

configure access to NATS

create NATS stream & consumer

configure access to Redis

import GTFS Schedule data

run `gtfs-rt-feed`

Alternative: Docker Compose setup

Operating `gtfs-rt-feed`

Logs

Monitoring

License

About

Uh oh!

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

OpenDataVBB/gtfs-rt-feed

Folders and files

Latest commit

History

Repository files navigation

gtfs-rt-feed

How matching works

Installation

Getting Started

Prerequisites

configure access to PostgreSQL

configure access to NATS

create NATS stream & consumer

configure access to Redis

import GTFS Schedule data

run gtfs-rt-feed

Alternative: Docker Compose setup

Operating gtfs-rt-feed

Logs

Monitoring

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

run `gtfs-rt-feed`

Operating `gtfs-rt-feed`

Packages