-
Notifications
You must be signed in to change notification settings - Fork 34
Description
Description:
I’ve encountered a persistent error when using ggalluvial’s geom_flow() to create an alluvial plot: Error in geom_flow(): Data is not in a recognized alluvial form. The error occurs even with a minimal, valid data.frame that matches the structure of the Titanic dataset (which works with geom_flow in the package examples). The data passes is_alluvia_form(), suggesting the issue lies in geom_flow()’s setup_data() function. This appears to be a bug in ggalluvial, as the same error occurs with both the CRAN version and the development version of the package.
Steps to Reproduce:
library(ggalluvial)
library(ggplot2)
# Minimal data: 2 countries, 2 years, 2 energy types
test_data <- data.frame(
country = rep(c("Egypt", "Kenya"), each = 4),
year = factor(rep(c("2000", "2001"), times = 4)),
energy_type = factor(rep(c("non_pollutant", "pollutant"), times = 4)),
demand = c(13.7, 60.77, 15.2, 64.17, 1.31, 2.04, 2.38, 1.53)
)
# Verify structure
str(test_data)
class(test_data) # Confirms "data.frame"
is_alluvia_form(test_data, axes = "year", id = "country", weight = "demand") # Should return TRUE
- Attempt to create an alluvial plot with geom_flow():
ggplot(test_data,
aes(x = year,
y = demand,
stratum = energy_type,
alluvium = country,
fill = energy_type)) +
geom_flow(stat = "flow") +
geom_stratum(width = 0.2) +
scale_x_discrete(expand = c(0.1, 0.1)) +
theme_minimal()
Expected Behavior:
The plot should render an alluvial diagram, with flows showing how each country’s demand splits between non_pollutant and pollutant energy types across the years 2000 and 2001. The Titanic dataset from the ggalluvial examples works with this setup, and test_data is structured similarly (data.frame with factors for axes and numeric weights).
Actual Behavior:
The plot fails with the following error:
Error in `geom_flow()`:
! Problem while computing stat.
ℹ Error occurred in the 1st layer.
Caused by error in `setup_data()`:
! Data is not in a recognized alluvial form (see `help('alluvial-data')` for details).
Traceback:
rlang::last_trace()
<error/rlang_error>
Error in `geom_flow()`:
! Problem while computing stat.
ℹ Error occurred in the 1st layer.
Caused by error in `setup_data()`:
! Data is not in a recognized alluvial form (see `help('alluvial-data')` for details).
---
Backtrace:
▆
1. ├─base (local) `<fn>`(x)
2. └─ggplot2:::print.ggplot(x)
3. ├─ggplot2::ggplot_build(x)
4. └─ggplot2:::ggplot_build.ggplot(x)
5. └─ggplot2:::by_layer(...)
6. ├─rlang::try_fetch(...)
7. │ ├─base::tryCatch(...)
8. │ │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
9. │ │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
10. │ │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
11. │ └─base::withCallingHandlers(...)
12. └─ggplot2 (local) f(l = layers[[i]], d = data[[i]])
13. └─l$compute_statistic(d, layout)
14. └─ggplot2 (local) compute_statistic(..., self = self)
15. └─self$stat$setup_data(data, self$computed_stat_params)
16. └─ggalluvial (local) setup_data(...)
17. └─base::stop("Data is not in a recognized alluvial form ", "(see `help('alluvial-data')` for details).")
Run rlang::last_trace(drop = FALSE) to see 5 hidden frames.
Environment:
- R version: [Run R.version.string to get this, e.g., "R version 4.3.1 (2023-06-16)"]
- ggalluvial version: [Run packageVersion("ggalluvial"), e.g., 0.12.5 or dev version]
- Other packages loaded: [Run sessionInfo() and include relevant details, e.g., ggplot2 version]
- Operating System: [Specify your OS, e.g., Windows 11, macOS Ventura 13.5, etc.]
Additional Context:
The data structure matches the requirements for geom_flow():
- country (factor): Alluvium identifier (like individuals in the Titanic dataset).
- year (factor): X-axis for time steps.
- energy_type (factor): Stratum for each time step.
- demand (numeric): Weight for the flows.
- is_alluvia_form(test_data, axes = "year", id = "country", weight = "demand") returns TRUE, indicating the data is in a valid alluvial form.
- The same error occurs with a larger dataset (4 countries, 5 years) and persists even after converting from tibble to data.frame, subsetting, and testing with the development version of ggalluvial.
- The Titanic dataset from the ggalluvial examples works, but custom data with a similar structure fails.
Minimal Reproducible Example (Reprex):
Here’s a self-contained reprex to reproduce the issue:
# Load packages
library(ggalluvial)
library(ggplot2)
# Create minimal data
test_data <- data.frame(
country = rep(c("Egypt", "Kenya"), each = 4),
year = factor(rep(c("2000", "2001"), times = 4)),
energy_type = factor(rep(c("non_pollutant", "pollutant"), times = 4)),
demand = c(13.7, 60.77, 15.2, 64.17, 1.31, 2.04, 2.38, 1.53)
)
# Verify data
str(test_data)
# 'data.frame': 8 obs. of 4 variables:
# $ country : chr "Egypt" "Egypt" "Egypt" "Egypt" ...
# $ year : Factor w/ 2 levels "2000","2001": 1 1 2 2 1 1 2 2
# $ energy_type: Factor w/ 2 levels "non_pollutant",..: 1 2 1 2 1 2 1 2
# $ demand : num 13.7 60.77 15.2 64.17 1.31 2.04 2.38 1.53
class(test_data) # "data.frame"
is_alluvia_form(test_data, axes = "year", id = "country", weight = "demand") # TRUE
# Attempt to plot
ggplot(test_data,
aes(x = year,
y = demand,
stratum = energy_type,
alluvium = country,
fill = energy_type)) +
geom_flow(stat = "flow") +
geom_stratum(width = 0.2) +
scale_x_discrete(expand = c(0.1, 0.1)) +
theme_minimal()
Workaround:
As a workaround, I switched to the networkD3 package, which successfully created an interactive Sankey diagram with the same data. However, geom_alluvium() from ggalluvial might work as an alternative within the package (I haven’t tested this yet due to time constraints).
Suggested Fix:
- Investigate setup_data() in ggalluvial to identify why it rejects valid data (passing is_alluvia_form()).
- Check for potential issues with factor levels, numeric weights, or internal assumptions about data structure.
- Ensure compatibility with plain data.frame inputs, as the Titanic dataset works but custom data fails.
Additional Notes:
I also encountered a similar issue with ggsankey’s geom_sankey(), suggesting a broader problem with alluvial/Sankey implementations in R. I’ll file a separate bug report for ggsankey later. For now, focusing on ggalluvial, this bug significantly impacts usability for custom datasets, and I’d appreciate any guidance or fixes.