Description
Is your feature request related to a problem or challenge?
I'd like to get the Metrics Reporting API into Rust. It's part of the catalog spec and currently only implemented by Iceberg Java (Python, Go and Rust haven't implemented it yet).
In short, the Metrics API allows us to monitor and better understand how clients operate on Iceberg files by providing numbers around:
- how long did the scan planning phase take (e.g. determining the data files to look at)
- how many manifest files were considered, skipped and scanned (from a manifest list)
- how many data files were considered, skipped and scanned (from a manifest file)
- same for delete files
- similar metrics for commits
To me, this is a crucial piece of functionality. In an Iceberg deployment, various clients can use the same catalog but operate on cloud storage directly. From the catalog, we only get to see which tables are accessed, or how they are modified at a high level. The data plane is completely opaque to the catalog. This means it can be hard to understand how each of the clients is using the stored data, at least in a consistent way. The metrics reporting API provides a solution for this.
I've checked in the #rust Slack channel and there seems to be general interest for this functionality 👍
Note that I marked open questions with
- ❓
Describe the solution you'd like
👣 The general steps I think this can be broken down into:
1. Foundational work
All other work items will re-use these APIs.
An API to consume reports (i.e. Java's MetricsReporter interface)
This trait will need to be async because it will likely use IO. The RestMetricsReporter is a good example using the network to send out its metrics reports.
It will also need the supertraits Debug
, Send
and Sync
. Debug
is necessary because the structs that will hold the reporter are deriving it (e.g. TableScan
). Send
and Sync
are necessary because the same reporter may be reused by many (concurrently running) operations.
It's visibility can be pub(crate)
for now. This allows us to integrate metrics reporting into the table operations, and provide a default implementation (like the LoggingMetricsReporter
). Later, this will be made public so that users can implement their own, and we can add a RestMetricsReporter
.
While metrics reporting can fail (e.g. when using IO), an error during reporting shouldn't affect the table operation in any way. To emphasize this, I think it makes sense to force the user to handle or log any errors internally and not return a Result<()>
.
- ❓The
report
parameter probably suffices to be a borrowed value. At the same time, reporting should be the terminal operation of any report. After that it probably shouldn't be reused. The signature might as well be explicit about this.
/// This trait defines the basic API for reporting metrics on table operations.
#[async_trait]
pub(crate) trait MetricsReporter: Debug + Send + Sync {
/// Indicates that an operation is done by publishing a MetricsReport.
async fn report(&self, report: MetricsReport);
}
Types for metric reports
This will be the scan and commit reports, encoding all the information made available to the reporter. For a list of fields, we can refer to the Java implementations ScanReport.java
and CommitReport.java
.
Note that the ScanMetrics
and CommitMetrics
structs can grow quite large because they contain a bunch of counters (see ScanMetricsResult.java
and CommitMetricsResult.java
for reference). In my local draft, Clippy recommended to Box
them for a more similar size of enum variants. This is subject to change.
pub(crate) enum MetricsReport {
Scan {
// ...All scan report fields...
metrics: Box<ScanMetrics>,
},
Commit {
// ...All commit report fields...
metrics: Box<CommitMetrics>,
},
}
pub(crate) struct ScanMetrics {
total_planning_duration: std::time::Duration,
result_data_files: Counter,
// more metrics...
}
The Java implementation uses custom classes for both DurationMetric
and CounterMetric
. I don't see a lot of value in creating a separate type for the duration because the unit is already encoded in std::time::Duration
.
This is not the case for u64s, so I included a custom Counter
in the snippet above. It probably makes sense to inform reporter implementations about which unit a newly added metric has but I'm open for suggestions here.
pub(crate) struct Counter {
pub(crate) value: u64,
pub(crate) unit: String,
}
Since the first expected usage of these types is to only build them internally and then pass them to potentially public reporter implementations, we should keep the fields pub(crate)
and provide accessors to them. This will only become relevant later however.
impl ScanMetrics {
// Accessors for all metrics
}
In particular, this approach differs from the Java implementation in that the MetricsReport
only defines an already built report - ready to be passed to the reporter. The Java implementation instead uses MetricsReports and MetricsReportResults. How the counters and durations are determined and set on the ScanMetrics
will be discussed next.
2. Report building
Relevant table operations will need to create metric reports, populate them with data and pass them to the metrics reporter.
I've primarily looked at the table scan operation (see TableScan.plan_files
) so far (this partially equates to Java's SnapshotScan.planFiles
.
Challenges I've identified are mainly around it's multi-threaded nature. The operation spawns multiple threads to process and filter manifest files and data (+delete) files (example). This means that we can't simply pass around the metrics structs but need to come up with a more sophisticated builder pattern that takes concurrent writes into account. I'm still working on this in my local draft.
Commit reporting will likely be analogous but I didn't yet look into it. This could also be added as a follow-up contribution.
3. Reporter implementations
To serve most use cases, and to validate the API, I think it will make sense to include at least one default implementation: the trivial LoggingReporter.
- ❓I don't yet see
tracing
or another log crate used outside of tests. If we don't want to commit to a specific log crate, we may need to leave this implementation to the user or make it a feature instead.
In a follow-up I would like to add a REST reporter. In Java, this is the default reporter whenever a REST catalog is used. It's basically sending the report to the catalog so it can aggregate reports across clients.
The iceberg-rest-fixture
that's already used in this repo ships with the /namespaces/{namespace}/tables/{table}/metrics
endpoint which can be used to validate this implementation.
4. Configuration
As a start, we will only have the LoggingMetricsReporter
available. The Java implementation uses it by default when nothing else is configured. For as long as the reporter API is kept private, we won't need to expose any configuration.
Looking forward, user may want to pass their own reporter implementations. The Java implementation allows to do this at the catalog-level, and on individual table operations.
If we decide to follow this approach, the reporter will need to be stored in the catalog and then passed down to scan and commit operations.
Note: While I will be able to work on this independently, I'm still happy and thankful for feedback or comments! 🙏
Willingness to contribute
I can contribute to this feature independently