-
Notifications
You must be signed in to change notification settings - Fork 27
Labels
bugSomething isn't workingSomething isn't workingevaluationanything related to the model evaluation pipelineanything related to the model evaluation pipeline
Description
What happened?
When running inference on a model trained with multiple input data streams, but requesting output for only a subset of those streams using the --analysis_streams_output
-argument, incorrect data may be written to disk depending on the order of the input streams.
Specifically, if the first input stream is omitted in --analysis_streams_output
, the output from that omitted stream may incorrectly be written to the Zarr directory corresponding to the second stream.
What are the steps to reproduce the bug?
- Train model on ERA5, NPPATMS and SurfaceCombined data (other streams are possible here):
../WeatherGenerator-private/hpc/launch-slurm.py --config config/mixed.yml --time 5
- Run inference on the trained model, but omit the ERA5-stream
uv run --offline inference --from_run_id xcl9xai1 --samples 2 --config ./config/mixed.yml --analysis_streams_output NPPATMS SurfaceCombined
- Read the target-data from NPPATMS (whihc is supoosed to have 22 channels) in an interactive Python-shell:
>>> import dask.array as da
>>> arr = da.from_zarr("<PATH_TO_DATA/validation_epoch00000_rank0000.zarr/0/NPPATMS/0/target/data")
>>> print(da)
dask.array<from-zarr, shape=(24801, 70), dtype=float32, chunksize=(16392, 70), chunktype=numpy.ndarray>
Thus, we get data for 70 channels, which corresponds to the ERA5-data in this example.
Version
develop
Platform (OS and architecture)
Linux
Relevant log output
See 'Steps to reproduce'-section
Accompanying data
No response
Organisation
JSC
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingevaluationanything related to the model evaluation pipelineanything related to the model evaluation pipeline
Type
Projects
Status
In Progress