-
Notifications
You must be signed in to change notification settings - Fork 38
Fix handling of missing values in GRIB files #966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
rafaqz/DimensionalData.jl#990 introduced a method ambiguity in Rasters. Can fix this by adding a dispatch but it would be nice to have a better fix for this so we don't have to keep adding dispatches |
You can download two of them from here: |
Now I get a different error, because the parent of the GRIBDataset.Variable is a |
Okay these files have geometry lookups, but these are not detected so longitude is read in as the values by default. On this branch this works - but you have to specify name and the lon/lat values aren't read: ras = Raster(filepath; lazy = true, name = :msl)
ras[1] @asinghvi17 this might work very soon with your work on geometry lookups? |
the CF branch I'm working on might read this properly now, if you want to check (geometry lookups are being rolled into just doing most of CF properly as a pretty massive rewrite of the common data model internals) |
Also for the record I don't really like that we are treating missingval separately for each dataset type, CommonDataModel.jl should have these methods instead at some stage... But fixing it here makes sense for now |
@tiemvanderdeure is this working now? could you share the error output? |
On
|
Bit of a more detailed report: Let me just share the output of
If I just call Raster on this file, it tries to read the
But eagerly reading this doesn't work, because
Specifying
|
Maybe we need a condional to check if the parent is already an Array and then skip collecting it. Better for memory used anyway |
I have to say that, in these files, In order to make a spatial selection, I use the Where selector for the msl variable, on the using Rasters
using IntervalSets
dataset = RasterStack(filename, lazy=true)
spatial_dim = [ d for d in dims(dataset["msl"]) if name(d) == "values"][1]
spatial_selection = spatial_dim[Where( c -> dataset["longitude"][c] ∈ -10..20 &&
dataset["latitude"][c] ∈ 35..60
)]
ds = view( dataset, values = Where(c -> c in spatial_selection) ) |
Wouldn't that have to be in GRIBDatasets? Because here we have a ModifiedDiskArray, whose parent is a GRIB Variable, whose parent is an Array. But I don't understand how we end up with an Array as the parent - the code must read and rewrap somewhere? |
This works, but the syntax really isn't great. So the goal is to make this work in a way that you can Raster on your dataset, it automatically detects that msl is the variable of interest, and that the values have longitude and latitude associated with them and stores that in a special type of lookup/dimension. Then you can do something like: msl = Raster(datapath)
msl[X = -10..20, Y = 35..60] |
We can also make those selections much faster than Where using a tree lookup or similar |
Are you speaking about the internals of Rasters or is it something that I can do in my script? In the later case, How? |
Raster internals, but you can equally do it in a script. We would make a spatial tree lookup like with SortTileRecursiveTree.jl or similar for points. Then points will look through the tree by spatial extent rather than reading through every one and comparing it. If the search space is large it's much faster, but also you need to do more than one selection for it to pay off as making the tree is also expensive |
I dove a little bit deeper into this readblock! error - and it again has got to do with the way CommonDataModel interacts with DiskArrays. Each variable in a GRIB Dataset is a CDM.CFVariable, and therefore returns However, that seems to be kind of a lie for the longitude and latitude variables, which are wrappers around a Vector, so really they are not This all kind of works out because of the
I don't know why GRIBDatasets always eagerly reads these dimensions. For us this is very annoying for cases like these, where they actually are big, since any time |
Ah right. Grib has a pretty weird design but it would be good if GribDataset just loaded the metadata... |
I suppose this should be addressed at the GRIBDatasets end, right? And some of the changes to CDM might improve this as well? I'm a little bit out of my depth here :) |
Yeah, but I'm not sure its easy or if anyone can address it... probably we need a custom dispatch her for now that just uses the parent object directly @tcarion any thoughts on this? |
Sorry, forgive me my ignorance about git, but, How can I install this version of the package in order to test it? I have tested the following:
|
@tiemvanderdeure maybe start pushing to rafaqz/Rasters with branch names like @aasdelat you would normally just in Julia do But this is still not too bad Then afterwards just So either way no git needed, Julia handles it all |
Thank you, @rafaqz , this worked. But I get an error that I think is the same that @tiemvanderdeure found: dataset = RasterStack(filepath, lazy=true)
# Works with no problem, but:
julia> ds = read(dataset)
ERROR: MethodError: no method matching readblock!(::Vector{Float64}, ::Vector{Float64}, ::UnitRange{Int64})
The function `readblock!` exists, but no method is defined for this combination of argument types.
Closest candidates are:
readblock!(::AbstractRaster, ::Any, ::AbstractUnitRange...)
@ Rasters ~/.julia/packages/Rasters/XNB8k/src/array.jl:112
readblock!(::DiskArrays.ConcatDiskArray, ::Any, ::AbstractUnitRange...)
@ DiskArrays ~/.julia/packages/DiskArrays/WaxIm/src/cat.jl:84
readblock!(::DiskArrays.ReshapedDiskArray, ::Any, ::OrdinalRange...)
@ DiskArrays ~/.julia/packages/DiskArrays/WaxIm/src/reshape.jl:44
...
Stacktrace:
[1] readblock!(A::GRIBDatasets.Variable{…}, aout::Vector{…}, i::UnitRange{…})
@ GRIBDatasets ~/.julia/packages/GRIBDatasets/JAZIL/src/variables.jl:136
[2] readblock!(A::Rasters.ModifiedDiskArray{…}, outer_block::Vector{…}, I::UnitRange{…})
@ Rasters ~/.julia/packages/Rasters/XNB8k/src/modifieddiskarray.jl:85
[3] readblock_checked!(a::Rasters.ModifiedDiskArray{…}, out::Vector{…}, i::UnitRange{…})
@ DiskArrays ~/.julia/packages/DiskArrays/WaxIm/src/indexing.jl:290
[4] getindex_disk_nobatch!(out::Nothing, a::Rasters.ModifiedDiskArray{…}, i::Tuple{…})
@ DiskArrays ~/.julia/packages/DiskArrays/WaxIm/src/indexing.jl:102
[5] getindex_disk!
@ ~/.julia/packages/DiskArrays/WaxIm/src/indexing.jl:65 [inlined]
[6] getindex_disk
@ ~/.julia/packages/DiskArrays/WaxIm/src/indexing.jl:58 [inlined]
[7] getindex
@ ~/.julia/packages/DiskArrays/WaxIm/src/indexing.jl:308 [inlined]
[8] _disk_collect
@ ~/.julia/packages/DiskArrays/WaxIm/src/array.jl:2 [inlined]
[9] Array(a::Rasters.ModifiedDiskArray{Union{…}, 1, GRIBDatasets.Variable{…}, Rasters.Mod{…}})
@ DiskArrays ~/.julia/packages/DiskArrays/WaxIm/src/array.jl:54
[10] (::Rasters.var"#45#46"{…})(O::Raster{…})
@ Rasters ~/.julia/packages/Rasters/XNB8k/src/array.jl:96
[11] #53
@ ~/.julia/packages/Rasters/XNB8k/src/array.jl:171 [inlined]
[12] _open(f::Rasters.var"#53#54"{…}, ::Rasters.GRIBsource, var::GRIBDatasets.Variable{…}; mod::Rasters.Mod{…}, kw::@Kwargs{})
@ Rasters ~/.julia/packages/Rasters/XNB8k/src/sources/commondatamodel.jl:94
[13] _open(f::Rasters.var"#53#54"{…}, source::Rasters.GRIBsource, ds::GRIBDatasets.GRIBDataset{…}; name::Symbol, group::Nothing, mod::Rasters.Mod{…}, kw::@Kwargs{})
@ Rasters ~/.julia/packages/Rasters/XNB8k/src/sources/commondatamodel.jl:91
[14] _open(f::Function, source::Rasters.GRIBsource, filename::String; write::Bool, kw::@Kwargs{…})
@ Rasters ~/.julia/packages/Rasters/XNB8k/src/sources/commondatamodel.jl:77
[15] _open
@ ~/.julia/packages/Rasters/XNB8k/src/sources/commondatamodel.jl:74 [inlined]
[16] #open#34
@ ~/.julia/packages/Rasters/XNB8k/src/filearray.jl:65 [inlined]
[17] open
@ ~/.julia/packages/Rasters/XNB8k/src/filearray.jl:64 [inlined]
[18] #_open_one#52
@ ~/.julia/packages/Rasters/XNB8k/src/array.jl:168 [inlined]
[19] _open_one
@ ~/.julia/packages/Rasters/XNB8k/src/array.jl:167 [inlined]
[20] #open#51
@ ~/.julia/packages/Rasters/XNB8k/src/array.jl:157 [inlined]
[21] open
@ ~/.julia/packages/Rasters/XNB8k/src/array.jl:149 [inlined]
[22] modify(f::Type{…}, A::Raster{…})
@ Rasters ~/.julia/packages/Rasters/XNB8k/src/array.jl:95
[23] (::DimensionalData.var"#625#626"{…})(a::Raster{…})
@ DimensionalData ~/.julia/packages/DimensionalData/wQISR/src/utils.jl:88
[24] macro expansion
@ ~/.julia/packages/DimensionalData/wQISR/src/utils.jl:236 [inlined]
[25] unrolled_map(f::DimensionalData.var"#625#626"{UnionAll}, v::Tuple{Raster{…}, Raster{…}, Raster{…}})
@ DimensionalData ~/.julia/packages/DimensionalData/wQISR/src/utils.jl:236
[26] maplayers(f::Function, s::RasterStack{…})
@ DimensionalData ~/.julia/packages/DimensionalData/wQISR/src/stack/stack.jl:232
[27] modify
@ ~/.julia/packages/DimensionalData/wQISR/src/utils.jl:88 [inlined]
[28] #read#301
@ ~/.julia/packages/Rasters/XNB8k/src/read.jl:16 [inlined]
[29] read(x::RasterStack{…})
@ Rasters ~/.julia/packages/Rasters/XNB8k/src/read.jl:12
[30] top-level scope
@ REPL[8]:1
Some type information was truncated. Use `show(err)` to see complete types. |
Yeah, we are discussing this. GRIBDatasets implementation of disk arrays is a bit weird |
Hi everyone, Sorry I needed time to get back into it.
This is because of the (quite unusual) way data are stored in GRIB files, and the way julia> grib = GribFile(filepath);
# There's one message for each 744 time step
julia> msg = Message(grib);
julia> lons, lats, values = data(msg)
julia> size.((lons, lats, values))
((338880,), (338880,), (338880,)) With one message, we have the full lookup for the horizontal dimensions. For the other dimensions, we need to go through all the messages to build up the lookup for the other dimensions. That's what's happening when you do: ds = GRIBDataset(filepath) It goes through all the messages to build the coordinates lookup by reading the metadata, without accessing the actual data of the messages, because it would mean expensive disk access. It only reads the first message to infer the data type, and takes this opportunity to load the horizontal coordinates into memory, so we don't need to open the file again when accessing them. This is why But anyway, I don't think this is causing the issue here. I think the problem is that for testfile in test_files
index = FileIndex(testfile)
print("Testing gridType: $(first(index["gridType"]))")
try
ds = RasterStack(testfile)
println()
catch e
println(" - Rasters failed !!!")
end
end This outputs: Testing gridType: regular_ll
Testing gridType: regular_ll
Testing gridType: regular_ll
Testing gridType: lambert - Rasters failed !!!
Testing gridType: regular_ll
Testing gridType: regular_gg
Testing gridType: regular_gg
Testing gridType: regular_gg
Testing gridType: regular_gg
Testing gridType: regular_ll
Testing gridType: regular_ll
Testing gridType: regular_ll
Testing gridType: regular_ll
Testing gridType: regular_ll
Testing gridType: reduced_gg - Rasters failed !!! |
On this branch and on GRIBDatasets master I can read all of these files without errors. We could totally improve on how they are read, but being able to read at all must be an improvement? The lambert grid is tricky, another candidate for a MergedLookup, I suppose?
|
You're right, sorry, I was using the wrong Rasters branch! |
Those should be ArrayLike linked lookups |
I've realised we need to make all of this trait based now . My current plan is to expand But CF actually allows these overlayed lookups and we should properly support them. Mapped up until now just let's you say it's in another projection underneath, but we will use the lookup in latlon instead. But we should allow both. Just need to figure which ones get to be called X and Y proper... Probably whichever has X and Y axis marked in the CF attributes. Which may break some workflows. |
Closes #964
@aasdelat can you confirm this fixes your issue? I don't have a good grib file to test on.