Skip to content

Fix handling of missing values in GRIB files #966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tiemvanderdeure
Copy link
Collaborator

Closes #964

@aasdelat can you confirm this fixes your issue? I don't have a good grib file to test on.

@tiemvanderdeure
Copy link
Collaborator Author

rafaqz/DimensionalData.jl#990 introduced a method ambiguity in Rasters. Can fix this by adding a dispatch but it would be nice to have a better fix for this so we don't have to keep adding dispatches

@aasdelat
Copy link

aasdelat commented May 7, 2025

Closes #964

@aasdelat can you confirm this fixes your issue? I don't have a good grib file to test on.

You can download two of them from here:
https://mega.nz/file/4gZE3A4K#xWFXWlHil2lAM5NpNAC-WTSEIa6PhYK8BTokOoW7iwE
https://mega.nz/file/k9Zl0bCA#RFMs-F7rgjc1EG8MWnEL2yjmJoqR6o7ITNzk_hL_T7M

@tiemvanderdeure
Copy link
Collaborator Author

Now I get a different error, because the parent of the GRIBDataset.Variable is a Vector. But I can read lazily

@tiemvanderdeure
Copy link
Collaborator Author

Okay these files have geometry lookups, but these are not detected so longitude is read in as the values by default. On this branch this works - but you have to specify name and the lon/lat values aren't read:

ras = Raster(filepath; lazy = true, name = :msl)
ras[1]

@asinghvi17 this might work very soon with your work on geometry lookups?

@rafaqz
Copy link
Owner

rafaqz commented May 7, 2025

the CF branch I'm working on might read this properly now, if you want to check add Rasters#cdm_overhaul

(geometry lookups are being rolled into just doing most of CF properly as a pretty massive rewrite of the common data model internals)

@rafaqz
Copy link
Owner

rafaqz commented May 7, 2025

Also for the record I don't really like that we are treating missingval separately for each dataset type, CommonDataModel.jl should have these methods instead at some stage...

But fixing it here makes sense for now

@rafaqz
Copy link
Owner

rafaqz commented May 7, 2025

Now I get a different error, because the parent of the GRIBDataset.Variable is a Vector. But I can read lazily

@tiemvanderdeure is this working now? could you share the error output?

@tiemvanderdeure
Copy link
Collaborator Author

On Rasters#cdm_overhaul I am getting yet another error. Maybe a file like this is a good test case?

julia> Raster(filepath; lazy = true)
ERROR: MethodError: Cannot `convert` an object of type Vector{String} to an object of type Tuple
The function `convert` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  (::Type{T})(::Any) where T<:Tuple
   @ Base tuple.jl:455
  convert(::Type{T}, ::T) where T<:Tuple
   @ Base essentials.jl:600
  convert(::Type{T}, ::T) where T
   @ Base Base.jl:126
  ...

Stacktrace:
  [1] setindex!(A::Vector{Tuple}, x::Vector{String}, i::Int64)
    @ Base .\array.jl:987
  [2] _organise_dataset(ds::GRIBDataset{Float64, 2, Missing}, names::Vector{String}, group::Rasters.NoKW)
    @ Rasters C:\Users\tsh371\.julia\packages\Rasters\C3CfE\src\sources\commondatamodel.jl:295
  [3] _raster(ds::GRIBDataset{…}; dims::Rasters.NoKW, refdims::Rasters.NoKW, name::Rasters.NoKW, group::Rasters.NoKW, filename::String, metadata::Rasters.NoKW, missingval::Rasters.NoKW, crs::Rasters.NoKW, mappedcrs::Rasters.NoKW, source::Rasters.GRIBsource, replace_missing::Rasters.NoKW, coerce::typeof(convert), scaled::Rasters.NoKW, verbose::Bool, write::Bool, lazy::Bool, dropband::Bool, checkmem::Bool, raw::Bool, mod::Rasters.NoKW, f::typeof(identity))
    @ Rasters C:\Users\tsh371\.julia\packages\Rasters\C3CfE\src\sources\commondatamodel.jl:970
  [4] _raster
    @ C:\Users\tsh371\.julia\packages\Rasters\C3CfE\src\sources\commondatamodel.jl:942 [inlined]
  [5] #78
    @ C:\Users\tsh371\.julia\packages\Rasters\C3CfE\src\array.jl:328 [inlined]
  [6] _open(f::Rasters.var"#78#79"{…}, source::Rasters.GRIBsource, ds::GRIBDataset{…}; name::Rasters.NoKW, group::Nothing, mod::Nothing, kw::@Kwargs{})
    @ Rasters C:\Users\tsh371\.julia\packages\Rasters\C3CfE\src\sources\commondatamodel.jl:120
  [7] _open(f::Function, source::Rasters.GRIBsource, filename::String; write::Bool, kw::@Kwargs{mod::Nothing})
    @ Rasters C:\Users\tsh371\.julia\packages\Rasters\C3CfE\src\sources\commondatamodel.jl:110
  [8] _open
    @ C:\Users\tsh371\.julia\packages\Rasters\C3CfE\src\sources\commondatamodel.jl:107 [inlined]
  [9] #_open#26
    @ C:\Users\tsh371\.julia\packages\Rasters\C3CfE\src\sources\sources.jl:110 [inlined]
 [10] Raster(filename::String; source::Rasters.NoKW, kw::@Kwargs{lazy::Bool})
    @ Rasters C:\Users\tsh371\.julia\packages\Rasters\C3CfE\src\array.jl:327
 [11] top-level scope
    @ REPL[3]:1
Some type information was truncated. Use `show(err)` to see complete types.

@tiemvanderdeure
Copy link
Collaborator Author

Bit of a more detailed report:

Let me just share the output of GRIBDataset so we know what this file is:

julia> GRIBDataset(filepath)
Dataset: C:/Users/tsh371/Downloads/ERA5_N320_1h_sl_sfc_NH15S_1980-12-01_1980-12-31_msl.grib
Group: /

Dimensions
   values = 338880
   valid_time = 744

Variables
  valid_time   (744)
    Datatype:    Dates.DateTime (Int64)
    Dimensions:  valid_time
    Attributes:
     units                = seconds since 1970-01-01T00:00:00
     calendar             = proleptic_gregorian
     long_name            = time
     standard_name        = time

  longitude   (338880)
    Datatype:    Float64 (Float64)
    Dimensions:  values
    Attributes:
     units                = degrees_east
     long_name            = longitude
     standard_name        = longitude

  latitude   (338880)
    Datatype:    Float64 (Float64)
    Dimensions:  values
    Attributes:
     units                = degrees_north
     long_name            = latitude
     standard_name        = latitude

  msl   (338880 × 744)
    Datatype:    Union{Missing, Float64} (Float64)
    Dimensions:  values × valid_time
    Attributes:
     units                = Pa
     long_name            = Mean sea level pressure
     standard_name        = air_pressure_at_mean_sea_level

If I just call Raster on this file, it tries to read the longitude values. On main this already errors with the conversion error.

julia> Raster(filepath; lazy = true)
┌ 338880-element Raster{Float64, 1} longitude ┐
├─────────────────────────────────────── dims ┤
  ↓ values
├─────────────────────────────────────────────┴───────────── metadata ┐
  Metadata{Rasters.GRIBsource} of Dict{String, Any} with 3 entries:
  "units"         => "degrees_east"
  "long_name"     => "longitude"
  "standard_name" => "longitude"
├───────────────────────────────────────────────────────────── raster ┤
  extent: Extent(values = (1, 338880),)
  filename: C:/Users/tsh371/Downloads/ERA5_N320_1h_sl_sfc_NH15S_1980-12-01_1980-12-31_msl.grib
└─────────────────────────────────────────────────────────────────────┘

But eagerly reading this doesn't work, because parent(GRIBDatasets.Variable) is a Vector.

julia> Raster(filepath)
ERROR: MethodError: no method matching readblock!(::Vector{Float64}, ::Vector{Float64}, ::UnitRange{Int64})
The function `readblock!` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  readblock!(::DiskArrays.PaddedDiskArray, ::Any, ::AbstractRange...)
   @ DiskArrays C:\Users\tsh371\.julia\dev\DiskArrays\src\pad.jl:73
  readblock!(::Rasters.RasterDiskArray, ::Any, ::AbstractUnitRange...)
   @ Rasters C:\Users\tsh371\.julia\dev\Rasters\src\filearray.jl:113
  readblock!(::AbstractRaster, ::Any, ::AbstractUnitRange...)
   @ Rasters C:\Users\tsh371\.julia\dev\Rasters\src\array.jl:112
  ...
Stacktrace:
  [1] readblock!(A::GRIBDatasets.Variable{Float64, 1, Vector{…}, GRIBDataset{…}}, aout::Vector{Float64}, i::UnitRange{Int64})
    @ GRIBDatasets C:\Users\tsh371\.julia\packages\GRIBDatasets\JAZIL\src\variables.jl:136
  [2] readblock!(A::Rasters.ModifiedDiskArray{…}, out_block::Vector{…}, I::UnitRange{…})
    @ Rasters C:\Users\tsh371\.julia\dev\Rasters\src\modifieddiskarray.jl:99
  [3] readblock_checked!(a::Rasters.ModifiedDiskArray{…}, out::Vector{…}, i::UnitRange{…})
    @ DiskArrays C:\Users\tsh371\.julia\dev\DiskArrays\src\indexing.jl:290
  [4] getindex_disk_nobatch!(out::Nothing, a::Rasters.ModifiedDiskArray{…}, i::Tuple{…})
    @ DiskArrays C:\Users\tsh371\.julia\dev\DiskArrays\src\indexing.jl:102
  [5] getindex_disk!
    @ C:\Users\tsh371\.julia\dev\DiskArrays\src\indexing.jl:65 [inlined]
  [6] getindex_disk
    @ C:\Users\tsh371\.julia\dev\DiskArrays\src\indexing.jl:58 [inlined]
  [7] getindex
    @ C:\Users\tsh371\.julia\dev\DiskArrays\src\indexing.jl:308 [inlined]
  [8] _disk_collect
    @ C:\Users\tsh371\.julia\dev\DiskArrays\src\array.jl:2 [inlined]
  [9] Array(a::Rasters.ModifiedDiskArray{Float64, 1, GRIBDatasets.Variable{…}, Rasters.NoMod{…}})
    @ DiskArrays C:\Users\tsh371\.julia\dev\DiskArrays\src\array.jl:54
 [10] (::Rasters.var"#77#78"{…})(var::GRIBDatasets.Variable{…})
    @ Rasters C:\Users\tsh371\.julia\dev\Rasters\src\array.jl:381
 [11] _open(f::Rasters.var"#77#78"{…}, ::Rasters.GRIBsource, var::GRIBDatasets.Variable{…}; mod::Nothing, kw::@Kwargs{})
    @ Rasters C:\Users\tsh371\.julia\dev\Rasters\src\sources\commondatamodel.jl:94
 [12] _open(f::Rasters.var"#77#78"{…}, source::Rasters.GRIBsource, ds::GRIBDataset{…}; name::Symbol, group::Rasters.NoKW, mod::Nothing, kw::@Kwargs{})
    @ Rasters C:\Users\tsh371\.julia\dev\Rasters\src\sources\commondatamodel.jl:91
 [13] _raster(ds::GRIBDataset{…}; dims::Rasters.NoKW, refdims::Tuple{}, name::Rasters.NoKW, group::Rasters.NoKW, filename::String, metadata::Rasters.NoKW, missingval::Rasters.NoKW, crs::Rasters.NoKW, mappedcrs::Rasters.NoKW, source::Rasters.GRIBsource, replace_missing::Rasters.NoKW, coerce::typeof(convert), scaled::Rasters.NoKW, verbose::Bool, write::Bool, lazy::Bool, dropband::Bool, checkmem::Bool, raw::Bool, mod::Rasters.NoKW, f::typeof(identity))
    @ Rasters C:\Users\tsh371\.julia\dev\Rasters\src\array.jl:365
 [14] _raster
    @ C:\Users\tsh371\.julia\dev\Rasters\src\array.jl:334 [inlined]
 [15] #73
    @ C:\Users\tsh371\.julia\dev\Rasters\src\array.jl:328 [inlined]
 [16] _open(f::Rasters.var"#73#74"{…}, source::Rasters.GRIBsource, ds::GRIBDataset{…}; name::Rasters.NoKW, group::Nothing, mod::Nothing, kw::@Kwargs{})
    @ Rasters C:\Users\tsh371\.julia\dev\Rasters\src\sources\commondatamodel.jl:87
 [17] _open(f::Function, source::Rasters.GRIBsource, filename::String; write::Bool, kw::@Kwargs{mod::Nothing})
    @ Rasters C:\Users\tsh371\.julia\dev\Rasters\src\sources\commondatamodel.jl:77
 [18] _open
    @ C:\Users\tsh371\.julia\dev\Rasters\src\sources\commondatamodel.jl:74 [inlined]
 [19] #_open#21
    @ C:\Users\tsh371\.julia\dev\Rasters\src\sources\sources.jl:110 [inlined]
 [20] Raster(filename::String; source::Rasters.NoKW, kw::@Kwargs{})
    @ Rasters C:\Users\tsh371\.julia\dev\Rasters\src\array.jl:327
 [21] Raster(filename::String)
    @ Rasters C:\Users\tsh371\.julia\dev\Rasters\src\array.jl:325
 [22] top-level scope
    @ REPL[5]:1
Some type information was truncated. Use `show(err)` to see complete types.

Specifying name works - no errors when reading the data. At some point we'd have the lon/lat values in a merged lookup, I suppose?

julia> Raster(filepath; lazy = true, name = :msl)
┌ 338880×744 Raster{Union{Missing, Float64}, 2} msl ┐
├───────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────── dims ┐
  ↓ values,
  → Ti     Sampled{Dates.DateTime} [1980-12-01T00:00:00, …, 1980-12-31T23:00:00] ForwardOrdered Irregular Points
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── metadata ┤
  Metadata{Rasters.GRIBsource} of Dict{String, Any} with 18 entries:
  "dataType"                  => "an"
  "stepUnits"                 => "1"
  "name"                      => "Mean sea level pressure"
  "N"                         => "320"
  "numberOfPoints"            => "338880"
  "missingValue"              => "9999"
  "gridType"                  => "reduced_gg"
  "gridDefinitionDescription" => "Gaussian Latitude/Longitude Grid"
  "NV"                        => "0"
  "typeOfLevel"               => "surface"
  "pl"                        => "(18, 25, 36, 40, 45, 50, 60, 64, 72, 72, 75, 81, 90, 96, 100, 108, 120, 120, 125, 135, 144, 144, 150, 160, 180, …
  "shortName"                 => "msl"
  "stepType"                  => "instant"
  "units"                     => "Pa"
  "totalNumber"               => "0"
  "paramId"                   => "151"
  "cfName"                    => "air_pressure_at_mean_sea_level"
  "cfVarName"                 => "msl"
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── raster ┤
  missingval: missing
  extent: Extent(values = (1, 338880), Ti = (Dates.DateTime("1980-12-01T00:00:00"), Dates.DateTime("1980-12-31T23:00:00")))
  filename: C:/Users/tsh371/Downloads/ERA5_N320_1h_sl_sfc_NH15S_1980-12-01_1980-12-31_msl.grib
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

@rafaqz
Copy link
Owner

rafaqz commented May 7, 2025

Maybe we need a condional to check if the parent is already an Array and then skip collecting it. Better for memory used anyway

@aasdelat
Copy link

aasdelat commented May 7, 2025

If I just call Raster on this file, it tries to read the longitude values.

I have to say that, in these files, longitude and latitude are indeed variables. In this case they are 1 dimensional vectors, but in other cases, they can be 2 dimensional matrices. So, msl could have, instead of one spatial dimension (values), tow ones (that are often named x and y), and, hence, the variables longitude and latitude would also be two dimensional.

In order to make a spatial selection, I use the Where selector for the msl variable, on the longitude variable:

using Rasters
using IntervalSets

dataset = RasterStack(filename, lazy=true)
spatial_dim = [ d for d in dims(dataset["msl"]) if name(d) == "values"][1]
spatial_selection = spatial_dim[Where( c -> dataset["longitude"][c]  -10..20 &&
                                       dataset["latitude"][c]    35..60
                                       )]
ds = view(   dataset, values = Where(c -> c in spatial_selection)   )

@tiemvanderdeure
Copy link
Collaborator Author

Maybe we need a condional to check if the parent is already an Array and then skip collecting it.

Wouldn't that have to be in GRIBDatasets? Because here we have a ModifiedDiskArray, whose parent is a GRIB Variable, whose parent is an Array.

But I don't understand how we end up with an Array as the parent - the code must read and rewrap somewhere?

@tiemvanderdeure
Copy link
Collaborator Author

In order to make a spatial selection, I use the Where selector for the msl variable, on the longitude variable:

This works, but the syntax really isn't great. So the goal is to make this work in a way that you can Raster on your dataset, it automatically detects that msl is the variable of interest, and that the values have longitude and latitude associated with them and stores that in a special type of lookup/dimension. Then you can do something like:

msl = Raster(datapath)
msl[X = -10..20, Y = 35..60]

@rafaqz
Copy link
Owner

rafaqz commented May 7, 2025

We can also make those selections much faster than Where using a tree lookup or similar

@aasdelat
Copy link

aasdelat commented May 7, 2025

We can also make those selections much faster than Where using a tree lookup or similar

Are you speaking about the internals of Rasters or is it something that I can do in my script? In the later case, How?

@rafaqz
Copy link
Owner

rafaqz commented May 7, 2025

Raster internals, but you can equally do it in a script.

We would make a spatial tree lookup like with SortTileRecursiveTree.jl or similar for points.

Then points will look through the tree by spatial extent rather than reading through every one and comparing it.

If the search space is large it's much faster, but also you need to do more than one selection for it to pay off as making the tree is also expensive

@tiemvanderdeure
Copy link
Collaborator Author

I dove a little bit deeper into this readblock! error - and it again has got to do with the way CommonDataModel interacts with DiskArrays.

Each variable in a GRIB Dataset is a CDM.CFVariable, and therefore returns false for isdisk, but the parents, which are GRIBDataset.Variable, return true.

However, that seems to be kind of a lie for the longitude and latitude variables, which are wrappers around a Vector, so really they are not isdisk at all.

This all kind of works out because of the getindex implementations in CommonDataModel, but now that we always wrap diskarrays in ModifiedDiskArray, we actually treat it as a diskarray (i.e. call readblock!), which it says it is but really is not.

julia> ds[:msl] |> parent |> typeof
GRIBDatasets.Variable{Float64, 2, GRIBDatasets.DiskValues{Float64, 2, 1}, GRIBDataset{Float64, 2, Missing}}

julia> ds[:longitude] |> parent |> typeof
GRIBDatasets.Variable{Float64, 1, Vector{Float64}, GRIBDataset{Float64, 2, Missing}}

I don't know why GRIBDatasets always eagerly reads these dimensions. For us this is very annoying for cases like these, where they actually are big, since any time open is called on a Raster like this, we call GRIBDataset on the file, which actually has some overhead (~3 seconds on my laptop) as it is reading in millions of lon/lat values.

@rafaqz
Copy link
Owner

rafaqz commented May 8, 2025

Ah right. Grib has a pretty weird design but it would be good if GribDataset just loaded the metadata...

@tiemvanderdeure
Copy link
Collaborator Author

I suppose this should be addressed at the GRIBDatasets end, right? And some of the changes to CDM might improve this as well?

I'm a little bit out of my depth here :)

@rafaqz
Copy link
Owner

rafaqz commented May 8, 2025

Yeah, but I'm not sure its easy or if anyone can address it... probably we need a custom dispatch her for now that just uses the parent object directly

@tcarion any thoughts on this?

@aasdelat
Copy link

aasdelat commented May 8, 2025

Closes #964

@aasdelat can you confirm this fixes your issue? I don't have a good grib file to test on.

Sorry, forgive me my ignorance about git, but, How can I install this version of the package in order to test it? I have tested the following:

(@Rasters) pkg> add https://github.com/rafaqz/Rasters.jl#966
     Cloning git-repo `https://github.com/rafaqz/Rasters.jl`
ERROR: GitError(Code:EAMBIGUOUS, Class:Object, ambiguous lookup - OID prefix is too short)
Stacktrace:
  [1] macro expansion
    @ ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/LibGit2/src/error.jl:115 [inlined]
  [2] LibGit2.GitObject(repo::LibGit2.GitRepo, spec::String)
    @ LibGit2 ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/LibGit2/src/repository.jl:142
  [3] get_object_or_branch(repo::LibGit2.GitRepo, rev::String)
    @ Pkg.Types ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/Types.jl:901
  [4] (::Pkg.Types.var"#58#59"{Pkg.Types.Context, Pkg.Types.PackageSpec, String})(repo::LibGit2.GitRepo)
    @ Pkg.Types ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/Types.jl:799
  [5] with(f::Pkg.Types.var"#58#59"{Pkg.Types.Context, Pkg.Types.PackageSpec, String}, obj::LibGit2.GitRepo)
    @ LibGit2 ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/LibGit2/src/types.jl:1166
  [6] handle_repo_add!(ctx::Pkg.Types.Context, pkg::Pkg.Types.PackageSpec)
    @ Pkg.Types ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/Types.jl:790
  [7] handle_repos_add!(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec})
    @ Pkg.Types ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/Types.jl:860
  [8] add(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; preserve::Pkg.Types.PreserveLevel, platform::Base.BinaryPlatforms.Platform, target::Symbol, allow_autoprecomp::Bool, kwargs::@Kwargs{io::IOContext{IO}})
    @ Pkg.API ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/API.jl:291
  [9] add(pkgs::Vector{Pkg.Types.PackageSpec}; io::IOContext{IO}, kwargs::@Kwargs{})
    @ Pkg.API ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/API.jl:159
 [10] add(pkgs::Vector{Pkg.Types.PackageSpec})
    @ Pkg.API ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/API.jl:148
 [11] do_cmd(command::Pkg.REPLMode.Command, io::Base.TTY)
    @ Pkg.REPLMode ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/REPLMode/REPLMode.jl:407
 [12] do_cmds(commands::Vector{Pkg.REPLMode.Command}, io::Base.TTY)
    @ Pkg.REPLMode ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/REPLMode/REPLMode.jl:393
 [13] do_cmds(repl::REPL.LineEditREPL, commands::String)
    @ REPLExt ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/Pkg/ext/REPLExt/REPLExt.jl:92
 [14] on_done(s::REPL.LineEdit.MIState, buf::IOBuffer, ok::Bool, repl::REPL.LineEditREPL)
    @ REPLExt ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/Pkg/ext/REPLExt/REPLExt.jl:106
 [15] #invokelatest#2
    @ ./essentials.jl:1055 [inlined]
 [16] invokelatest
    @ ./essentials.jl:1052 [inlined]
 [17] (::REPLExt.var"#47#50"{REPL.LineEditREPL})(s::REPL.LineEdit.MIState, buf::IOBuffer, ok::Bool)
    @ REPLExt ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/Pkg/ext/REPLExt/REPLExt.jl:128
 [18] #invokelatest#2
    @ ./essentials.jl:1055 [inlined]
 [19] invokelatest
    @ ./essentials.jl:1052 [inlined]
 [20] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
    @ REPL.LineEdit ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2755
 [21] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
    @ REPL ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPL.jl:1506
 [22] (::REPL.var"#79#85"{REPL.LineEditREPL, REPL.REPLBackendRef})()
    @ REPL ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/REPL/src/REPL.jl:497

@rafaqz
Copy link
Owner

rafaqz commented May 8, 2025

@tiemvanderdeure maybe start pushing to rafaqz/Rasters with branch names like tvd/branchname? It makes it easier for people to try your branches

@aasdelat you would normally just in Julia do ] add Rasters#branchname and julia handles it for you, which is kind of miraculously easy.

But this is still not too bad ] add https://github.com/tiemvanderdeure/Rasters.jl#grib_missingval

Then afterwards just ] add Rasters again to reset.

So either way no git needed, Julia handles it all

@aasdelat
Copy link

aasdelat commented May 8, 2025

But this is still not too bad ] add https://github.com/tiemvanderdeure/Rasters.jl#grib_missingval

Thank you, @rafaqz , this worked. But I get an error that I think is the same that @tiemvanderdeure found:

dataset = RasterStack(filepath, lazy=true)
# Works with no problem, but:
julia> ds = read(dataset)
ERROR: MethodError: no method matching readblock!(::Vector{Float64}, ::Vector{Float64}, ::UnitRange{Int64})
The function `readblock!` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  readblock!(::AbstractRaster, ::Any, ::AbstractUnitRange...)
   @ Rasters ~/.julia/packages/Rasters/XNB8k/src/array.jl:112
  readblock!(::DiskArrays.ConcatDiskArray, ::Any, ::AbstractUnitRange...)
   @ DiskArrays ~/.julia/packages/DiskArrays/WaxIm/src/cat.jl:84
  readblock!(::DiskArrays.ReshapedDiskArray, ::Any, ::OrdinalRange...)
   @ DiskArrays ~/.julia/packages/DiskArrays/WaxIm/src/reshape.jl:44
  ...

Stacktrace:
  [1] readblock!(A::GRIBDatasets.Variable{…}, aout::Vector{…}, i::UnitRange{…})
    @ GRIBDatasets ~/.julia/packages/GRIBDatasets/JAZIL/src/variables.jl:136
  [2] readblock!(A::Rasters.ModifiedDiskArray{…}, outer_block::Vector{…}, I::UnitRange{…})
    @ Rasters ~/.julia/packages/Rasters/XNB8k/src/modifieddiskarray.jl:85
  [3] readblock_checked!(a::Rasters.ModifiedDiskArray{…}, out::Vector{…}, i::UnitRange{…})
    @ DiskArrays ~/.julia/packages/DiskArrays/WaxIm/src/indexing.jl:290
  [4] getindex_disk_nobatch!(out::Nothing, a::Rasters.ModifiedDiskArray{…}, i::Tuple{…})
    @ DiskArrays ~/.julia/packages/DiskArrays/WaxIm/src/indexing.jl:102
  [5] getindex_disk!
    @ ~/.julia/packages/DiskArrays/WaxIm/src/indexing.jl:65 [inlined]
  [6] getindex_disk
    @ ~/.julia/packages/DiskArrays/WaxIm/src/indexing.jl:58 [inlined]
  [7] getindex
    @ ~/.julia/packages/DiskArrays/WaxIm/src/indexing.jl:308 [inlined]
  [8] _disk_collect
    @ ~/.julia/packages/DiskArrays/WaxIm/src/array.jl:2 [inlined]
  [9] Array(a::Rasters.ModifiedDiskArray{Union{…}, 1, GRIBDatasets.Variable{…}, Rasters.Mod{…}})
    @ DiskArrays ~/.julia/packages/DiskArrays/WaxIm/src/array.jl:54
 [10] (::Rasters.var"#45#46"{})(O::Raster{…})
    @ Rasters ~/.julia/packages/Rasters/XNB8k/src/array.jl:96
 [11] #53
    @ ~/.julia/packages/Rasters/XNB8k/src/array.jl:171 [inlined]
 [12] _open(f::Rasters.var"#53#54"{}, ::Rasters.GRIBsource, var::GRIBDatasets.Variable{…}; mod::Rasters.Mod{…}, kw::@Kwargs{})
    @ Rasters ~/.julia/packages/Rasters/XNB8k/src/sources/commondatamodel.jl:94
 [13] _open(f::Rasters.var"#53#54"{}, source::Rasters.GRIBsource, ds::GRIBDatasets.GRIBDataset{…}; name::Symbol, group::Nothing, mod::Rasters.Mod{…}, kw::@Kwargs{})
    @ Rasters ~/.julia/packages/Rasters/XNB8k/src/sources/commondatamodel.jl:91
 [14] _open(f::Function, source::Rasters.GRIBsource, filename::String; write::Bool, kw::@Kwargs{})
    @ Rasters ~/.julia/packages/Rasters/XNB8k/src/sources/commondatamodel.jl:77
 [15] _open
    @ ~/.julia/packages/Rasters/XNB8k/src/sources/commondatamodel.jl:74 [inlined]
 [16] #open#34
    @ ~/.julia/packages/Rasters/XNB8k/src/filearray.jl:65 [inlined]
 [17] open
    @ ~/.julia/packages/Rasters/XNB8k/src/filearray.jl:64 [inlined]
 [18] #_open_one#52
    @ ~/.julia/packages/Rasters/XNB8k/src/array.jl:168 [inlined]
 [19] _open_one
    @ ~/.julia/packages/Rasters/XNB8k/src/array.jl:167 [inlined]
 [20] #open#51
    @ ~/.julia/packages/Rasters/XNB8k/src/array.jl:157 [inlined]
 [21] open
    @ ~/.julia/packages/Rasters/XNB8k/src/array.jl:149 [inlined]
 [22] modify(f::Type{…}, A::Raster{…})
    @ Rasters ~/.julia/packages/Rasters/XNB8k/src/array.jl:95
 [23] (::DimensionalData.var"#625#626"{})(a::Raster{…})
    @ DimensionalData ~/.julia/packages/DimensionalData/wQISR/src/utils.jl:88
 [24] macro expansion
    @ ~/.julia/packages/DimensionalData/wQISR/src/utils.jl:236 [inlined]
 [25] unrolled_map(f::DimensionalData.var"#625#626"{UnionAll}, v::Tuple{Raster{…}, Raster{…}, Raster{…}})
    @ DimensionalData ~/.julia/packages/DimensionalData/wQISR/src/utils.jl:236
 [26] maplayers(f::Function, s::RasterStack{…})
    @ DimensionalData ~/.julia/packages/DimensionalData/wQISR/src/stack/stack.jl:232
 [27] modify
    @ ~/.julia/packages/DimensionalData/wQISR/src/utils.jl:88 [inlined]
 [28] #read#301
    @ ~/.julia/packages/Rasters/XNB8k/src/read.jl:16 [inlined]
 [29] read(x::RasterStack{…})
    @ Rasters ~/.julia/packages/Rasters/XNB8k/src/read.jl:12
 [30] top-level scope
    @ REPL[8]:1
Some type information was truncated. Use `show(err)` to see complete types.

@rafaqz
Copy link
Owner

rafaqz commented May 8, 2025

Yeah, we are discussing this. GRIBDatasets implementation of disk arrays is a bit weird

@tcarion
Copy link
Contributor

tcarion commented May 9, 2025

Hi everyone,

Sorry I needed time to get back into it.

I don't know why GRIBDatasets always eagerly reads these dimensions. For us this is very annoying for cases like these, where they actually are big, since any time open is called on a Raster like this, we call GRIBDataset on the file, which actually has some overhead (~3 seconds on my laptop) as it is reading in millions of lon/lat values.

This is because of the (quite unusual) way data are stored in GRIB files, and the way GRIB.jl is designed to access them. GRIB files are composed of so-called "messages" containing the data along with metadata defining its coordinates. The way how horizontal coordinates and other coordinates are accessed are inherently different, because only the horizontal coordinates data are self-contained in one message. To illustrate this:

julia> grib = GribFile(filepath);
# There's one message for each 744 time step
julia> msg = Message(grib);
julia> lons, lats, values = data(msg)
julia> size.((lons, lats, values))
((338880,), (338880,), (338880,))

With one message, we have the full lookup for the horizontal dimensions. For the other dimensions, we need to go through all the messages to build up the lookup for the other dimensions. That's what's happening when you do:

ds = GRIBDataset(filepath)

It goes through all the messages to build the coordinates lookup by reading the metadata, without accessing the actual data of the messages, because it would mean expensive disk access. It only reads the first message to infer the data type, and takes this opportunity to load the horizontal coordinates into memory, so we don't need to open the file again when accessing them. This is why "longitude" and "latitude" Variables parents are Array's and not DiskArrays.

But anyway, I don't think this is causing the issue here. I think the problem is that GRIBDataset is weirdly handling GRIB files defined on some grid types as shown in details below. I think it actually doesn't follow the CF conventions, which, if I understand well, Rasters is getting more and more fussy about 😃 I will try to investigate this.

for testfile in test_files
    index = FileIndex(testfile)
    print("Testing gridType: $(first(index["gridType"]))")
    try 
        ds = RasterStack(testfile)
        println()
    catch e
        println(" - Rasters failed !!!")
    end
end

This outputs:

Testing gridType: regular_ll
Testing gridType: regular_ll
Testing gridType: regular_ll
Testing gridType: lambert - Rasters failed !!!
Testing gridType: regular_ll
Testing gridType: regular_gg
Testing gridType: regular_gg
Testing gridType: regular_gg
Testing gridType: regular_gg
Testing gridType: regular_ll
Testing gridType: regular_ll
Testing gridType: regular_ll
Testing gridType: regular_ll
Testing gridType: regular_ll
Testing gridType: reduced_gg - Rasters failed !!!

@tiemvanderdeure
Copy link
Collaborator Author

On this branch and on GRIBDatasets master I can read all of these files without errors. We could totally improve on how they are read, but being able to read at all must be an improvement?

The lambert grid is tricky, another candidate for a MergedLookup, I suppose?

julia> joinpath(gribexamples_dir, "lambert_grid.grib") |> RasterStack
┌ 475×475×1×1 RasterStack ┐
├─────────────────────────┴───────────────────────────────────────────────────────────────────────────── dims ┐
  ↓ X ,
  → Y ,
  ↗ Z  Sampled{Int32} [0] ForwardOrdered Regular Points,
  ⬔ Ti Sampled{DateTime} [DateTime("1990-01-25T18:00:00")] ForwardOrdered Irregular Points
├───────────────────────────────────────────────────────────────────────────────────────────────────── layers ┤
  :longitude eltype: Union{Missing, Float64} dims: X, Y size: 475×475
  :latitude  eltype: Union{Missing, Float64} dims: X, Y size: 475×475
  :nlwrs     eltype: Union{Missing, Float64} dims: X, Y, Z, Ti size: 475×475×1×1
├─────────────────────────────────────────────────────────────────────────────────────────────────── metadata ┤
  Metadata{GRIBsource} of Dict{String, Any} with 6 entries:
  "edition"           => "1"
  "source"            => "C:\\Users\\tsh371\\.julia\\dev\\GRIBDatasets\\test\\sample-data\\lambert_grid.grib"  
  "centreDescription" => "Athens"
  "centre"            => "96"
  "subCentre"         => "99"
  "Conventions"       => "CF-1.7"
├───────────────────────────────────────────────────────────────────────────────────────────────────── raster ┤
  missingval: missing
  extent: Extent(X = (1, 475), Y = (1, 475), Z = (0, 0), Ti = (DateTime("1990-01-25T18:00:00"), DateTime("1990-01-25T18:00:00")))
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

@tcarion
Copy link
Contributor

tcarion commented May 12, 2025

You're right, sorry, I was using the wrong Rasters branch!

@asinghvi17
Copy link
Collaborator

Those should be ArrayLike linked lookups

@rafaqz
Copy link
Owner

rafaqz commented May 12, 2025

I've realised we need to make all of this trait based now .

My current plan is to expand Mapped lookups so it can hold both linear and rotated lookups at the same time, and use traits to trigger the grouping. So you always have one lookup in the underlying crs, and one with values mapped to another projection. It will be pretty breaking.

But CF actually allows these overlayed lookups and we should properly support them.

Mapped up until now just let's you say it's in another projection underneath, but we will use the lookup in latlon instead. But we should allow both.

Just need to figure which ones get to be called X and Y proper... Probably whichever has X and Y axis marked in the CF attributes. Which may break some workflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RasterStack() fails to load grib files
5 participants