Telemetry Server

Server components to receive, validate, convert, store, and process Telemetry data from the Mozilla Firefox browser.

Talk to us on irc.mozilla.org in the #telemetry channel.

Roadmap

Next / In Progress:

See the TODO list

Completed:

Nail down the new storage format based on Bug 856263
Define on-disk storage structure based on the telemetry-reboot etherpad
Build a converter to take existing data as input and output in the new format + structure
Plumb converter into the current pipeline (Bagheera -> Kafka -> converter -> format.v2)
Build MapReduce framework to take new format + structure as input and output data as required by the telemetry-frontend
Build replacement frontend acquisition pipeline (HTTP -> persister -> format.v2)

Storage Format

See StorageFormat for details.

On-disk Storage Structure

See StorageLayout for details.

Data Converter

Use RevisionCache to load the correct Histograms.json for a given payload
1. Use revision if possible
2. Fall back to appUpdateChannel and appBuildID or appVersion as needed
3. Use the Mercurial history to export each version of Histograms.json with the date range it was in effect for each repo (mozilla-central, -aurora, -beta, -release)
4. Keep local cache of Histograms.json versions to avoid re-fetching
Filter out bad submission data
1. Invalid histogram names
2. Histogram configs that don't match the expected parameters (histogram type, num buckets, etc)
3. Keep metrics for bad data

MapReduce

We have implemented a lightweight MapReduce framework that uses the Operating System's support for parallelism. It relies on simple python functions for the Map, Combine, and Reduce phases.

For data stored on multiple machines, each machine will run a combine phase, with the final reduce combining output for the entire cluster.

Plumbing

Once we have the converter and MapReduce framework available, we can easily consume from the existing Telemetry data source. This will mark the first point that the new dashboards can be fed with live data.

Integration with the existing pipeline is discussed in more detail on the Bagheera Integration page.

Data Acquisition

When everything is ready and productionized, we will route the client (Firefox) submissions directly into the new pipeline.

Code Overview

`server.py`

Contains the prototype http server for receiving payloads. The submit function is where the interesting things happen.

It accepts single submissions using the same type of URLs supported by Bagheera, and also has endpoints for batch submission (which improves throughput for the production -> prototype relay).

`convert.py`

Contains the Converter class, which is used to convert a JSON payload from the raw form submitted by Firefox to the more compact storage format for on-disk storage and processing.

You can run the main method in this file to process data exported from the old telemetry backend (via pig, jydoop, etc), or you can use the Converter class to convert data in a more fine-grained way.

`persist.py`

Contains the StorageLayout class, which is used to save payloads to disk using the directory structure as documented in the storage layout section above.

`revision_cache.py`

Contains the RevisionCache class, which provides a mechanism for fetching the Histograms.json spec file for a given revision URL. Histogram data is cached locally on disk and in-memory as revisions are requested.

`telemetry_schema.py`

Contains the TelemetrySchema class, which encapsulates logic used by the StorageLayout and MapReduce code.

`job.py`

Contains the MapReduce code. This is the interface for running jobs on Telemetry data. There are example job scripts and input filters in the examples/ directory.

`compressor.py`

Contains code to compress and rotate raw data files. Suitable for running from cron.

Name		Name	Last commit message	Last commit date
Latest commit History 262 Commits
aws_provisioning		aws_provisioning
docs		docs
examples		examples
server		server
util		util
.gitignore		.gitignore
README.md		README.md
TODO.md		TODO.md
compress.py		compress.py
compressor.py		compressor.py
convert.py		convert.py
decompress.py		decompress.py
export.py		export.py
get_compressibles.py		get_compressibles.py
get_histogram_tools.sh		get_histogram_tools.sh
job.py		job.py
persist.py		persist.py
process_incoming.py		process_incoming.py
revision_cache.py		revision_cache.py
split_raw_log.py		split_raw_log.py
telemetry_schema.json		telemetry_schema.json
telemetry_schema.py		telemetry_schema.py
telemetry_server_config.example.json		telemetry_server_config.example.json
test_revision_cache.py		test_revision_cache.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Telemetry Server

Roadmap

Next / In Progress:

Completed:

Storage Format

On-disk Storage Structure

Data Converter

MapReduce

Plumbing

Data Acquisition

Code Overview

`server.py`

`convert.py`

`persist.py`

`revision_cache.py`

`telemetry_schema.py`

`job.py`

`compressor.py`

About

Uh oh!

Releases

Packages

aitgithub/telemetry-server

Folders and files

Latest commit

History

Repository files navigation

Telemetry Server

Roadmap

Next / In Progress:

Completed:

Storage Format

On-disk Storage Structure

Data Converter

MapReduce

Plumbing

Data Acquisition

Code Overview

server.py

convert.py

persist.py

revision_cache.py

telemetry_schema.py

job.py

compressor.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

`server.py`

`convert.py`

`persist.py`

`revision_cache.py`

`telemetry_schema.py`

`job.py`

`compressor.py`

Packages