Server components to receive, validate, convert, store, and process Telemetry data from the Mozilla Firefox browser.
Talk to us on irc.mozilla.org
in the #telemetry
channel.
See the TODO list
- Nail down the new storage format based on Bug 856263
- Define on-disk storage structure based on the telemetry-reboot etherpad
- Build a converter to take existing data as input and output in the new format + structure
- Plumb converter into the current pipeline (Bagheera -> Kafka -> converter -> format.v2)
- Build MapReduce framework to take new format + structure as input and output data as required by the telemetry-frontend
- Build replacement frontend acquisition pipeline (HTTP -> persister -> format.v2)
See StorageFormat for details.
See StorageLayout for details.
- Use RevisionCache to load the correct Histograms.json for a given payload
- Use
revision
if possible - Fall back to
appUpdateChannel
andappBuildID
orappVersion
as needed - Use the Mercurial history to export each version of Histograms.json with the date range it was in effect for each repo (mozilla-central, -aurora, -beta, -release)
- Keep local cache of Histograms.json versions to avoid re-fetching
- Use
- Filter out bad submission data
- Invalid histogram names
- Histogram configs that don't match the expected parameters (histogram type, num buckets, etc)
- Keep metrics for bad data
We have implemented a lightweight MapReduce framework that uses the Operating System's support for parallelism. It relies on simple python functions for the Map, Combine, and Reduce phases.
For data stored on multiple machines, each machine will run a combine phase, with the final reduce combining output for the entire cluster.
Once we have the converter and MapReduce framework available, we can easily consume from the existing Telemetry data source. This will mark the first point that the new dashboards can be fed with live data.
Integration with the existing pipeline is discussed in more detail on the Bagheera Integration page.
When everything is ready and productionized, we will route the client (Firefox) submissions directly into the new pipeline.
Contains the prototype http server for receiving payloads. The submit
function is where the interesting things happen.
It accepts single submissions using the same type of URLs supported by Bagheera, and also has endpoints for batch submission (which improves throughput for the production -> prototype relay).
Contains the Converter
class, which is used to convert a JSON payload from
the raw form submitted by Firefox to the more compact storage format for
on-disk storage and processing.
You can run the main method in this file to process data exported from the
old telemetry backend (via pig, jydoop, etc), or you can use the Converter
class to convert data in a more fine-grained way.
Contains the StorageLayout
class, which is used to save payloads to disk
using the directory structure as documented in the storage layout section
above.
Contains the RevisionCache
class, which provides a mechanism for fetching
the Histograms.json
spec file for a given revision URL. Histogram data is
cached locally on disk and in-memory as revisions are requested.
Contains the TelemetrySchema
class, which encapsulates logic used by the
StorageLayout and MapReduce code.
Contains the MapReduce code. This is the interface for running jobs on
Telemetry data. There are example job scripts and input filters in the
examples/
directory.
Contains code to compress and rotate raw data files. Suitable for running from
cron
.