Skip to content

Metrics reporting #1496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 25 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
eb8109c
Foundational work
DerGut Jun 20, 2025
20a0e80
refactor: TableScan::plan_files into parallel steps
DerGut Jun 26, 2025
e1dc699
Use serialization-based logger
DerGut Jun 27, 2025
16af416
Set metrics reporter on TableScan
DerGut Jun 27, 2025
8832027
Collect metrics for indexed deletes
DerGut Jun 27, 2025
05dc825
Collect manifest file metrics
DerGut Jun 29, 2025
ce52bf6
Drop unnecessary Box<>
DerGut Jun 29, 2025
3bec473
Collect metrics for data and delete files
DerGut Jun 29, 2025
4fcfbed
Inlcude metrics mod
DerGut Jun 29, 2025
7242774
Include TableIdent in TableScan
DerGut Jun 29, 2025
1b8e8c6
Send metrics report
DerGut Jun 29, 2025
ddc9627
Add missing brackets around import
DerGut Jun 29, 2025
f1e598d
Test metrics reporting
DerGut Jun 29, 2025
08f72bd
Move stream writing outside of processing functions
DerGut Jun 29, 2025
683ad4f
Replace Box<ScanMetrics> with Arc<ScanMetrics>
DerGut Jul 1, 2025
876c708
Rever vec comment
DerGut Jul 6, 2025
c74e855
Move JoinHandle for delete index metrics
DerGut Jul 6, 2025
90ce07c
Use JoinHandle for delete file metrics
DerGut Jul 6, 2025
630cc53
Use JoinHandle for data file metrics and refactor
DerGut Jul 6, 2025
ed4987d
Be explicit about JoinHandles and awaits
DerGut Jul 7, 2025
467c565
Simplify LoggingMetricsReporter
DerGut Jul 7, 2025
9e34b42
Feature-flag TableBuilder::metrics_reporter for tests only
DerGut Jul 7, 2025
166cf5d
Join all metrics handles
DerGut Jul 7, 2025
22c4612
Fix clippy warnings
DerGut Jul 7, 2025
ed6c139
Remove unclear comment
DerGut Jul 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions crates/iceberg/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ typed-builder = { workspace = true }
url = { workspace = true }
uuid = { workspace = true }
zstd = { workspace = true }
tracing = { workspace = true }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, tracing was only used in tests. As far as I could find, the LoggingMetricsReporter is the first use of any logging in the iceberg crate.
I'm not entirely sure whether it's a good idea include it and commit on a specific logging crate. tracing seems reasonably standard and compatible with other crates though. I'd also like to include some default reporter. The Java implementation comes with it's LoggingMetricsReporter.java based on SL4J.

I've also run into some issues using the tracing crate (as outlined in this comment) but they can probably be worked around and shouldn't be a deciding factor.


[dev-dependencies]
ctor = { workspace = true }
Expand Down
51 changes: 44 additions & 7 deletions crates/iceberg/src/delete_file_index.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,10 @@ use std::sync::{Arc, RwLock};

use futures::StreamExt;
use futures::channel::mpsc::{Sender, channel};
use itertools::Itertools;
use tokio::sync::Notify;

use crate::runtime::spawn;
use crate::runtime::{JoinHandle, spawn};
use crate::scan::{DeleteFileContext, FileScanTaskDeleteFile};
use crate::spec::{DataContentType, DataFile, Struct};

Expand Down Expand Up @@ -51,33 +52,52 @@ struct PopulatedDeleteFileIndex {
// TODO: Deletion Vector support
}

#[derive(Debug)]
pub(crate) struct DeleteIndexMetrics {
pub(crate) indexed_delete_files: u32,
pub(crate) equality_delete_files: u32,
pub(crate) positional_delete_files: u32,
}

impl DeleteFileIndex {
/// create a new `DeleteFileIndex` along with the sender that populates it with delete files
pub(crate) fn new() -> (DeleteFileIndex, Sender<DeleteFileContext>) {
/// Create a new `DeleteFileIndex` along with the sender that populates it
/// with delete files
///
/// It will asynchronously wait for all delete files to come in before it
/// starts indexing.
pub(crate) fn new() -> (
DeleteFileIndex,
Sender<DeleteFileContext>,
JoinHandle<DeleteIndexMetrics>,
) {
// TODO: what should the channel limit be?
let (tx, rx) = channel(10);
let (delete_file_tx, delete_file_rx) = channel(10);
let notify = Arc::new(Notify::new());
let state = Arc::new(RwLock::new(DeleteFileIndexState::Populating(
notify.clone(),
)));
let delete_file_stream = rx.boxed();
let delete_file_stream = delete_file_rx.boxed();

spawn({
let metrics_handle = spawn({
let state = state.clone();
async move {
let delete_files = delete_file_stream.collect::<Vec<_>>().await;

let populated_delete_file_index = PopulatedDeleteFileIndex::new(delete_files);

let metrics = populated_delete_file_index.metrics();

{
let mut guard = state.write().unwrap();
*guard = DeleteFileIndexState::Populated(populated_delete_file_index);
}
notify.notify_waiters();

metrics
}
});

(DeleteFileIndex { state }, tx)
(DeleteFileIndex { state }, delete_file_tx, metrics_handle)
}

/// Gets all the delete files that apply to the specified data file.
Expand Down Expand Up @@ -207,4 +227,21 @@ impl PopulatedDeleteFileIndex {

results
}

fn metrics(&self) -> DeleteIndexMetrics {
// We count both partitioned and globally applied equality deletes.
let equality_delete_files =
flattened_len(&self.eq_deletes_by_partition) + self.global_deletes.len() as u32;
let positional_delete_files = flattened_len(&self.pos_deletes_by_partition);

DeleteIndexMetrics {
indexed_delete_files: equality_delete_files + positional_delete_files,
equality_delete_files,
positional_delete_files,
}
}
}

fn flattened_len(map: &HashMap<Struct, Vec<Arc<DeleteFileContext>>>) -> u32 {
map.values().flatten().try_len().unwrap_or(0) as u32
}
6 changes: 3 additions & 3 deletions crates/iceberg/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -70,11 +70,11 @@ pub mod table;

mod avro;
pub mod cache;
pub mod io;
pub mod spec;

pub mod inspect;
pub mod io;
pub mod metrics;
pub mod scan;
pub mod spec;

pub mod expr;
pub mod transaction;
Expand Down
154 changes: 154 additions & 0 deletions crates/iceberg/src/metrics.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

//! This module contains the metrics reporting API for Iceberg.
//!
//! It is used to report table operations in a pluggable way. See the [docs]
//! for more details.
//!
//! [docs] https://iceberg.apache.org/docs/latest/metrics-reporting

use std::collections::HashMap;
use std::fmt::Debug;
use std::sync::Arc;
use std::time::Duration;

use async_trait::async_trait;
use tracing::info;

use crate::TableIdent;
use crate::expr::Predicate;
use crate::spec::SchemaId;

/// This trait defines the API for reporting metrics of table operations.
///
/// Refer to the [Iceberg docs] for details.
///
/// [Iceberg docs]: https://iceberg.apache.org/docs/latest/metrics-reporting/
#[async_trait]
pub(crate) trait MetricsReporter: Debug + Send + Sync {
/// Indicates that an operation is done by reporting a MetricsReport.
///
/// Any errors are expected to be handled internally.
async fn report(&self, report: MetricsReport);
}

/// An enum of all metrics reports.
#[derive(Debug)]
pub(crate) enum MetricsReport {
/// A Table Scan report that contains all relevant information from a Table Scan.
Scan {
table: TableIdent,
snapshot_id: i64,
schema_id: SchemaId,

/// If None, the scan is an unfiltered full table scan.
filter: Option<Arc<Predicate>>,

/// If None, the scan projects all fields.
// TODO: We could default to listing all field names in those cases: check what Java is doing.
projected_field_names: Option<Vec<String>>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: The list of field names would be more helpful in reporting than an empty value

projected_field_ids: Arc<Vec<i32>>,

metrics: Arc<ScanMetrics>,
metadata: HashMap<String, String>,
},
}

/// Carries all metrics for a particular scan.
#[derive(Debug)]
pub(crate) struct ScanMetrics {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the Java implementation uses special types for the metrics (e.g. TimerResult.java and CounterResult.java). They include a value and a unit but I felt like both the ScanMetrics' field names and their types should convey everything we need. The RestMetricReporter will need to emit reports that follow this format but I omitted it from the general purpose ScanMetrics.

Happy for any feedback!

pub(crate) total_planning_duration: Duration,

// Manifest-level metrics, computed by walking the snapshot's manifest list
// file entries and checking which manifests match the scan's predicates.
pub(crate) total_data_manifests: u32,
pub(crate) total_delete_manifests: u32,
pub(crate) skipped_data_manifests: u32,
pub(crate) skipped_delete_manifests: u32,
pub(crate) scanned_data_manifests: u32,
pub(crate) scanned_delete_manifests: u32,

// Data file-level metrics.
pub(crate) result_data_files: u32,
pub(crate) skipped_data_files: u32,
pub(crate) total_file_size_in_bytes: u64,

// Delete file-level metrics.
pub(crate) result_delete_files: u32,
pub(crate) skipped_delete_files: u32,
pub(crate) total_delete_file_size_in_bytes: u64,

pub(crate) indexed_delete_files: u32,
pub(crate) equality_delete_files: u32,
pub(crate) positional_delete_files: u32,
}

/// A reporter that logs the metrics to the console.
#[derive(Clone, Debug)]
pub(crate) struct LoggingMetricsReporter {}

impl LoggingMetricsReporter {
pub(crate) fn new() -> Self {
Self {}
}
}

#[async_trait]
impl MetricsReporter for LoggingMetricsReporter {
async fn report(&self, report: MetricsReport) {
match report {
MetricsReport::Scan {
table,
snapshot_id,
schema_id,
filter,
projected_field_names,
projected_field_ids,
metrics,
metadata,
} => {
info!(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to use debug-formatted values here. I was struggling a lot to use the tracing API, and this is the best I could come up with so far.
I didn't really want to serialize the struct into a json, nor did I know how to implement fmt::Display for values such that they make sense across tracing subscribers.
Any suggestions welcome!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also use some feedback about the degree to which we want to mimic the Java implementation's log records.

table = %table,
snapshot_id = snapshot_id,
schema_id = schema_id,
filter = ?filter,
projected_field_names = ?projected_field_names,
projected_field_ids = ?projected_field_ids,
scan_metrics.total_planning_duration = ?metrics.total_planning_duration,
scan_metrics.total_data_manifests = metrics.total_data_manifests,
scan_metrics.total_delete_manifests = metrics.total_delete_manifests,
scan_metrics.scanned_data_manifests = metrics.scanned_data_manifests,
scan_metrics.scanned_delete_manifests = metrics.scanned_delete_manifests,
scan_metrics.skipped_data_manifests = metrics.skipped_data_manifests,
scan_metrics.skipped_delete_manifests = metrics.skipped_delete_manifests,
scan_metrics.result_data_files = metrics.result_data_files,
scan_metrics.result_delete_files = metrics.result_delete_files,
scan_metrics.skipped_data_files = metrics.skipped_data_files,
scan_metrics.skipped_delete_files = metrics.skipped_delete_files,
scan_metrics.total_file_size_in_bytes = metrics.total_file_size_in_bytes,
scan_metrics.total_delete_file_size_in_bytes = metrics.total_delete_file_size_in_bytes,
scan_metrics.indexed_delete_files = metrics.indexed_delete_files,
scan_metrics.equality_delete_files = metrics.equality_delete_files,
scan_metrics.positional_delete_files = metrics.positional_delete_files,
metadata = ?metadata,
"Received metrics report"
);
}
}
}
}
27 changes: 24 additions & 3 deletions crates/iceberg/src/scan/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ use futures::{SinkExt, TryFutureExt};
use crate::delete_file_index::DeleteFileIndex;
use crate::expr::{Bind, BoundPredicate, Predicate};
use crate::io::object_cache::ObjectCache;
use crate::scan::metrics::ManifestMetrics;
use crate::scan::{
BoundPredicates, ExpressionEvaluatorCache, FileScanTask, ManifestEvaluatorCache,
PartitionFilterCache,
Expand Down Expand Up @@ -186,16 +187,25 @@ impl PlanContext {
tx_data: Sender<ManifestEntryContext>,
delete_file_idx: DeleteFileIndex,
delete_file_tx: Sender<ManifestEntryContext>,
) -> Result<Box<impl Iterator<Item = Result<ManifestFileContext>> + 'static>> {
) -> Result<(Vec<Result<ManifestFileContext>>, ManifestMetrics)> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we were returning a vector and this function was only called at a single place, I took the liberty of changing the return value. This somewhat simplified passing the result to a spawned thread because Vec implies Send + Sync when it's Items do.

I've also extended the TODO comment below for future reference because I've added another obstacle to simply using an iterator here: the ManifestMetrics are now continuously mutated in the loop. If we used an iterator instead, we couldn't as easily (I think) pass around the mutable reference.

let manifest_files = manifest_list.entries().iter();

// TODO: Ideally we could ditch this intermediate Vec as we return an iterator.
// TODO: Ideally we could ditch this intermediate Vec as we can return
// an iterator over the results. Updates to the manifest metrics somewhat
// complicate this because they need to be serialized somewhere, and an
// iterator can't easily take ownership of the metrics.
// A vec allows us to apply the mutations within this function.
// A vec also implicitly implements Send and Sync, meaning we can pass
// it around more easily in the concurrent planning step.
let mut filtered_mfcs = vec![];

let mut metrics = ManifestMetrics::default();
for manifest_file in manifest_files {
let tx = if manifest_file.content == ManifestContentType::Deletes {
metrics.total_delete_manifests += 1;
delete_file_tx.clone()
} else {
metrics.total_data_manifests += 1;
tx_data.clone()
};

Expand All @@ -212,6 +222,10 @@ impl PlanContext {
)
.eval(manifest_file)?
{
match manifest_file.content {
ManifestContentType::Data => metrics.skipped_data_manifests += 1,
ManifestContentType::Deletes => metrics.skipped_delete_manifests += 1,
}
continue;
}

Expand All @@ -230,7 +244,14 @@ impl PlanContext {
filtered_mfcs.push(Ok(mfc));
}

Ok(Box::new(filtered_mfcs.into_iter()))
// They're not yet scanned, but will be scanned concurrently in the
// next processing step.
metrics.scanned_data_manifests =
metrics.total_data_manifests - metrics.skipped_data_manifests;
metrics.scanned_delete_manifests =
metrics.total_delete_manifests - metrics.skipped_delete_manifests;

Ok((filtered_mfcs, metrics))
}

fn create_manifest_file_context(
Expand Down
Loading
Loading