Skip to content

Measure with string type, ends up with SUM aggregation in the generated SQL to pre-aggregations #9462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kevinleeTCA opened this issue Apr 14, 2025 · 4 comments
Assignees
Labels
bug Something isn't working pre-aggregations Issues related to pre-aggregations

Comments

@kevinleeTCA
Copy link

kevinleeTCA commented Apr 14, 2025

To Reproduce
This is my cube definition, but you can try this with any string/text based columns:

cube(`LE_LegalEntitiesMetadata`, {
    sql: `select * from data_export.legal_entities_metadata`,

    public: false,

    preAggregations: {
        rollup: {
            dimensions: [
                CUBE.legal_entity_id,
            ],
            measures: [
                CUBE.display_names
            ],
            indexes: {
                idx: {
                    columns: [CUBE.legal_entity_id]
                }
            },
            refresh_key: {
                every: `1 hour`,
            }
        }
    },


measures: {
        display_names: {
            sql: `STRING_AGG(${CUBE.display_name}::TEXT, ', ')`,
            type: `string`,
            title: `Investors/Company owners`
        }
    },

    dimensions: {
        legal_entity_id: {
            sql: `legal_entity_id`,
            type: `string`,
            primary_key: true,
            shown: true
        },
        entity_type: {
            sql: `entity_type`,
            type: `string`,
        },
        display_name: {
            sql: `display_name`,
            type: `string`,
            title: `Investors/Company owners`
        }
    },

    dataSource: `legal_entity`
});

I am using STRING_AGG measure, but the query:

{
  "dimensions": [
    "LE_LegalEntitiesMetadata.legal_entity_id"
  ],
  "measures": [
    "LE_LegalEntitiesMetadata.display_names"
  ]
}

ends up using sum in the generated SQL:

SELECT
  `l_e__legal_entities_metadata__legal_entity_id` `l_e__legal_entities_metadata__legal_entity_id`,
  sum(`l_e__legal_entities_metadata__display_names`) `l_e__legal_entities_metadata__display_names`
FROM
  dev_pre_aggregations.l_e__legal_entities_metadata_rollup AS `l_e__legal_entities_metadata__rollup`
GROUP BY
  1
ORDER BY
  2 DESC
LIMIT
  10000

Image
Image

Expected behavior
It should remain as STRING_AGG

  STRING_AGG(
    "l_e__legal_entities_metadata".display_name :: TEXT,
    ', '
  ) "l_e__legal_entities_metadata__display_names"

Version:
1.1.18

@igorlukanin
Copy link
Member

I can confirm it reproduces in 1.2.34.

However, it only reproduces when the query hits a pre-aggregation. The query to the upstream data source contains correct aggregation.

@igorlukanin igorlukanin added bug Something isn't working pre-aggregations Issues related to pre-aggregations labels Apr 14, 2025
@igorlukanin igorlukanin self-assigned this Apr 14, 2025
@igorlukanin igorlukanin changed the title Measure of STRING_AGG with string type, ends up with sum operator in the generated SQL Measure with string type, ends up with SUM aggregation in the generated SQL to pre-aggregations Apr 14, 2025
@kevinleeTCA
Copy link
Author

I can confirm it reproduces in 1.2.34.

However, it only reproduces when the query hits a pre-aggregation. The query to the upstream data source contains correct aggregation.

Thank you @igorlukanin , yes it happens when I tried to using the roll up pre-aggregation, as I need this measure in the rollup_joins to be able to have date aggregated from different sources. (so a basica rollup pre-aggregation is needed in my case, in order to join other rollups). Any idea on how we could fix it or walk around it ?

@igorlukanin
Copy link
Member

Sorry, it's unclear to me where the bug exactly is. I could've suggested not using pre-aggs for now but since you need to do a rollup_join it doesn't seem to be an option.

Maybe you can switch from rollup_join pre-aggs to using an external query engine (e.g., Trino) to join the data from different data sources while we're figuring this out; however, I understand that this might be quite an effort.

@kevinleeTCA
Copy link
Author

Maybe you can switch from rollup_join pre-aggs to using an external query engine (e.g., Trino) to join the data from different data sources while we're figuring this out; however, I understand that this might be quite an effort.

Something similar, we have to join the data from our dbt pipeline before expose it to cube, it will work. Would be great if this bug can be fixed, and cube can support more string based aggregation in pre-aggregation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pre-aggregations Issues related to pre-aggregations
Projects
None yet
Development

No branches or pull requests

2 participants