-
Notifications
You must be signed in to change notification settings - Fork 114
Open
Labels
bugSomething isn't workingSomething isn't workingtriageTasks that need to be triagedTasks that need to be triaged
Description
Is this a new bug in metricflow?
- I believe this is a new bug in metricflow
- I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
Specifying --where on derived metrics doesn't work as expected as the constructed query has the where filter applied before the metric is created.
Expected Behavior
In the constructed query where filter should be applied after the metric is created,
Notice how the constructed query has the where filter applied incorrectly.
Steps To Reproduce
semantic_model
semantic_models:
- name: sem_perf
defaults:
agg_time_dimension: activity_date
model: ref('perf')
entities:
- name: sp
type: primary
expr: '1'
- name: region_mp_sku
type: unique
expr: region_mp_sku
dimensions:
- name: activity_date
type: time
expr: activity_date
type_params:
time_granularity: day
measures:
- name: pv_cost_lc
expr: CASE WHEN channel = '3' THEN cost_lc ELSE 0 END
agg: sum
create_metric: true
metrics:
- name: pv_cost_lc_category
type: derived
label: pv_cost_lc_category
type_params:
expr: |
CASE
WHEN pv_cost_lc = 0 THEN 'no_spend'
WHEN pv_cost_lc <= approx_percentile(CASE WHEN pv_cost_lc != 0 THEN pv_cost_lc END, 0.33) OVER () THEN 'low'
WHEN pv_cost_lc <= approx_percentile(CASE WHEN pv_cost_lc != 0 THEN pv_cost_lc END, 0.90) OVER () THEN 'medium'
ELSE 'high'
END
metrics:
- name: pv_cost_lc
Query
mf query \
--metrics pv_cost_lc,pv_cost_lc_category \
--group-by region_mp_sku__activity_date \
--where "pv_cost_lc_category = 'medium'" \
--start-time '2025-07-01' \
--end-time '2025-07-14' \
--explain
SELECT
COALESCE(subq_10.region_mp_sku__activity_date__day, subq_18.region_mp_sku__activity_date__day) AS region_mp_sku__activity_date__day
, MAX(subq_10.pv_cost_lc) AS pv_cost_lc
, MAX(subq_18.pv_cost_lc_category) AS pv_cost_lc_category
FROM (
SELECT
region_mp_sku__activity_date__day
, SUM(pv_cost_lc) AS pv_cost_lc
FROM (
SELECT
DATE_TRUNC('day', activity_date) AS region_mp_sku__activity_date__day
, CASE WHEN channel = '3' THEN cost_lc ELSE 0 END AS pv_cost_lc
FROM "trino"."dev"."perf" sem_perf_src_10000
WHERE DATE_TRUNC('day', activity_date) BETWEEN timestamp '2025-07-01' AND timestamp '2025-07-14'
) subq_7
WHERE pv_cost_lc_category = 'medium'
GROUP BY
region_mp_sku__activity_date__day
) subq_10
FULL OUTER JOIN (
SELECT
region_mp_sku__activity_date__day
, CASE
WHEN pv_cost_lc = 0 THEN 'no_spend'
WHEN pv_cost_lc <= approx_percentile(CASE WHEN pv_cost_lc != 0 THEN pv_cost_lc END, 0.33) OVER () THEN 'low'
WHEN pv_cost_lc <= approx_percentile(CASE WHEN pv_cost_lc != 0 THEN pv_cost_lc END, 0.90) OVER () THEN 'medium'
ELSE 'high'
END AS pv_cost_lc_category
FROM (
SELECT
region_mp_sku__activity_date__day
, SUM(pv_cost_lc) AS pv_cost_lc
FROM (
SELECT
DATE_TRUNC('day', activity_date) AS region_mp_sku__activity_date__day
, CASE WHEN channel = '3' THEN cost_lc ELSE 0 END AS pv_cost_lc
FROM "trino"."dev"."perf" sem_perf_src_10000
WHERE DATE_TRUNC('day', activity_date) BETWEEN timestamp '2025-07-01' AND timestamp '2025-07-14'
) subq_14
WHERE pv_cost_lc_category = 'medium'
GROUP BY
region_mp_sku__activity_date__day
) subq_17
) subq_18
ON
subq_10.region_mp_sku__activity_date__day = subq_18.region_mp_sku__activity_date__day
GROUP BY
COALESCE(subq_10.region_mp_sku__activity_date__day, subq_18.region_mp_sku__activity_date__day)
Relevant log output
N/A
Environment
- OS: Mac
- Python:3.11.11
- dbt: 1.8.9
- metricflow: 0.7.1
Which database are you using?
other (mention it in "Additional Context")
Additional Context
Trino
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingtriageTasks that need to be triagedTasks that need to be triaged