geobleu

Python implementation of GEO-BLEU, a similarity evaluation method for trajectories

https://dl.acm.org/doi/abs/10.1145/3557915.3560951

GIS Cup 2025 uses GEO-BLEU as the evaluation metric, and this repository provides necessary resources for the evaluation.

GEO-BLEU is a similarity measure with a stronger focus on local features, as in similarity measures for natural language processing (e.g. BLEU). The more similar two trajectories are, the larger the value. It assigns a score of 1 to two identical trajectories.

Note: (in reverse-chronological order)

Jul 30: Implemented calc_geobleu_bulk(), a supplementary function to evaluate sequences of multiple users at once and in parallel.
May 28: The validation tool and the parameters for GEO-BLEU have been updated to be compatible with the tasks of GIS Cup 2025.
May 28: Switched the GEO-BLEU evaluation function used in the usage example from the multi-process version calc_geobleu() to the single-process version calc_geobleu_single(). We will redesign the interface of the multi-process version so that it can handle data containing sequences of multiple users and re-release it soon.

Installation

After downloading the repository and entering into it, execute the installation command as follows:

python3 setup.py install

or

pip3 install .

Prerequisites: numpy, scipy

Evaluation functions (per uid) for GIS Cup 2025

Overview

This package provides a per-uid evaluation function, calc_geobleu_single(), for comparing two trajectories under the conditions of the GIS Cup. The function receives generated and reference trajectories belonging to a uid as the arguments and gives the similarity value for GEO-BLEU. A trajectory is assumed to be a list of tuples, each representing (d, t, x, y) or (uid, d, t, x, y), and the values of days and times must be the same between generated and reference at each step. Internally, the function evaluates trajectories day by day and return the average over the days.

The final score for each city will be the average of the function's output over all the uids.

Also, the library provides a supplementary evaluation function, calc_geobleu_bulk(), to compute GEO-BLEU scores across multiple users in parallel by applying calc_geobleu_single() for each user. The arguments are as follows:

generated: A list of tuples, each in the format (uid, d, t, x, y), representing trajectories generated by a model for multiple users.
reference: A list of tuples, each in the format (uid, d, t, x, y), representing the ground-truth trajectories used for evaluation.
processes: The number of parallel processes to use (default: 4).

Internally, the function groups the input sequences by uid, verifies that both lists contain the same set of uids, and then computes calc_geobleu_single() for each uid in parallel. The final result is the arithmetic mean of the GEO-BLEU scores across all the uids.

Example usage of the evaluation functions

The following example calculates GEO-BLEU for two sample trajectories of a uid.

import geobleu

# tuple format: (d, t, x, y)
generated = [
    (60, 12, 84, 88),
    (60, 15, 114, 78),
    (60, 21, 121, 96),
    (61, 12, 78, 86),
    (61, 13, 89, 67),
    (61, 17, 97, 70),
    (61, 20, 96, 70),
    (61, 24, 111, 80),
    (61, 25, 114, 78),
    (61, 26, 99, 70),
    (61, 38, 77, 86),
    (62, 12, 77, 86),
    (62, 14, 102, 129),
    (62, 15, 104, 131),
    (62, 17, 106, 131),
    (62, 18, 104, 110)]

reference = [
    (60, 12, 82, 93),
    (60, 15, 114, 78),
    (60, 21, 116, 96),
    (61, 12, 82, 84),
    (61, 13, 89, 67),
    (61, 17, 97, 70),
    (61, 20, 91, 67),
    (61, 24, 109, 82),
    (61, 25, 110, 78),
    (61, 26, 99, 70),
    (61, 38, 77, 86),
    (62, 12, 77, 86),
    (62, 14, 97, 125),
    (62, 15, 104, 131),
    (62, 17, 106, 131),
    (62, 18, 103, 111)]

geobleu_val = geobleu.calc_geobleu_single(generated, reference)
print("geobleu: {}".format(geobleu_val))

# geobleu: 0.07556369896234784

Also, the following example calculates GEO-BLEU for trajectories of multiple users.

import geobleu

# tuple format: (uid, d, t, x, y)
generated = [
    (1, 60, 12, 84, 88),
    (1, 60, 21, 121, 96),
    (1, 61, 12, 78, 86),
    (1, 61, 20, 96, 70),
    (1, 61, 26, 99, 70),
    (1, 61, 38, 77, 86),
    (1, 62, 12, 77, 86),
    (1, 62, 18, 104, 110),
    (2, 60, 14, 25, 105),
    (2, 60, 15, 25, 103),
    (2, 61, 20, 35, 108),
    (2, 61, 31, 25, 96),
    (3, 61, 24, 74, 100),
    (3, 62, 7, 85, 72)]

reference = [
    (1, 60, 12, 82, 93),
    (1, 60, 21, 116, 96),
    (1, 61, 12, 82, 84),
    (1, 61, 20, 50, 48),
    (1, 61, 26, 99, 70),
    (1, 61, 38, 99, 70),
    (1, 62, 12, 77, 86),
    (1, 62, 18, 103, 111),
    (2, 60, 14, 26, 120),
    (2, 60, 15, 30, 103),
    (2, 61, 20, 35, 109),
    (2, 61, 31, 28, 96),
    (3, 61, 24, 82, 95),
    (3, 62, 7, 86, 70)]

geobleu_val = geobleu.calc_geobleu_bulk(generated, reference, processes=4)
print("geobleu: {}".format(geobleu_val))

# geobleu: 0.1653726297984943

Hyperparameter settings

As for the hyperparameters for GEO-BLEU, we use N = 5 (using unigrams, bigrams, up to 5-grams), w_n = 1/5 (modified precisions are geometric-averaged with equal weights), and beta = 0.5 (so that the proximity between two points becomes e^-1 when they are 1 km away).

Validation tool

You can check whether your submission files conform to the task requirements using a standalone Python program, validator.py. It takes the task ID, the corresponding training data file path, and the submission file path as arguments, and it emits errors if it finds any issues with the formatting or inconsistencies between the training data and the given submission file. A submission file may begin with the header line uid,d,t,x,y, but omitting it is also acceptable.

For example, assuming task B's training data after decompression is at foo/city_B_challengedata.csv, and your submission file for task B before compression is at bar/baz_task_b_humob.csv, the command will be:

python3 validator.py b foo/city_B_challengedata.csv bar/baz_task_b_humob.csv

The line number and the step number in a trajectory in error messages are 0-indexed. If the tool doesn't find any issues, it will simply say "Validation finished without errors!".

Simple interface for evaluating a trajectory pair

Using the installed package, you can evaluate the similarity between generated and reference trajectories, giving the generated one as the first argument and the reference one as the second to its function calc_geobleu_orig().

import geobleu

generated = [(1, 1), (2, 2), (3, 3)]
reference = [(1, 1), (1, 1), (1, 2), (2, 2), (2, 2)]

similarity = geobleu.calc_geobleu_orig(generated, reference)
print(similarity)

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
geobleu		geobleu
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
validator.py		validator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

geobleu

Installation

Evaluation functions (per uid) for GIS Cup 2025

Overview

Example usage of the evaluation functions

Hyperparameter settings

Validation tool

Simple interface for evaluating a trajectory pair

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

yahoojapan/geobleu

Folders and files

Latest commit

History

Repository files navigation

geobleu

Installation

Evaluation functions (per uid) for GIS Cup 2025

Overview

Example usage of the evaluation functions

Hyperparameter settings

Validation tool

Simple interface for evaluating a trajectory pair

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages