-
I have a dict which name is
Must I convert the dict to pandas.Dataframe and then convert the dataframe to dataset? And then I encountered a new problem when I try to save my dataset:
It crashed with this report:
My test_ds looks like this:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hi @forestbat , sorry to hear you're having problems.
It doesn't look like your input follows the format required by d = {
"var": {"dims": ("t", "x"), "data": [[data1, data2, ...]]},
"t": {"dims": ("t"), "data": [datetime64]},
}
ds = xr.Dataset.from_dict(d) where here we would be creating a length-1 If your data is already in a file format (e.g. netcdf) I encourage you to create your dataset with
You should never need to do this. I recommend you read our page on data structures and first decide how you want your dataset to be structured, then you should be able to use
This error message is telling you what the problem is. Your However it looks to me as if you are using many different variables to store difference values of your data, when you should instead be storing the different values as varying along a dimension of the dataset. Can you represent your data as a single numpy array first? |
Beta Was this translation helpful? Give feedback.
-
Given the structure you posted in #7202 (reply in thread) (mapping of timestep to row), you can translate to something In [8]: import xarray as xr
...: from datetime import datetime
...:
...: original_data = {
...: datetime(2022, 10, 24, 0, 0, 0): [0, 1, 2, 3, 4, 5],
...: datetime(2022, 10, 24, 1, 0, 0): [1, 2, 3, 4, 5, 6],
...: datetime(2022, 10, 24, 2, 0, 0): [2, 3, 4, 5, 6, 7],
...: }
...: data = {
...: "time": {"dims": "time", "data": list(original_data.keys())},
...: "data": {"dims": ["time", "columns"], "data": list(original_data.values())},
...: }
...: ds = xr.Dataset.from_dict(data)
...: ds
Out[8]:
<xarray.Dataset>
Dimensions: (time: 3, columns: 6)
Coordinates:
* time (time) datetime64[ns] 2022-10-24 ... 2022-10-24T02:00:00
Dimensions without coordinates: columns
Data variables:
data (time, columns) int64 0 1 2 3 4 5 1 2 3 4 5 6 2 3 4 5 6 7
In [9]: ds.data
Out[9]:
<xarray.DataArray 'data' (time: 3, columns: 6)>
array([[0, 1, 2, 3, 4, 5],
[1, 2, 3, 4, 5, 6],
[2, 3, 4, 5, 6, 7]])
Coordinates:
* time (time) datetime64[ns] 2022-10-24 ... 2022-10-24T02:00:00
Dimensions without coordinates: columns This will give you a dataset with the structure @TomNicholas suggested. If you actually need the structure you described in #7202 (comment) (one variable per column), you can get that by converting the dataset: In [10]: by_column = ds.data.to_dataset(dim="columns")
...: by_column
Out[10]:
<xarray.Dataset>
Dimensions: (time: 3)
Coordinates:
* time (time) datetime64[ns] 2022-10-24 ... 2022-10-24T02:00:00
Data variables:
0 (time) int64 0 1 2
1 (time) int64 1 2 3
2 (time) int64 2 3 4
3 (time) int64 3 4 5
4 (time) int64 4 5 6
5 (time) int64 5 6 7 However, the netcdf file format does not seem to allow integer variable names (even though In [11]: renamed = by_column.rename({name: str(name) for name in by_column.data_vars.keys()})
...: renamed
Out[11]:
<xarray.Dataset>
Dimensions: (time: 3)
Coordinates:
* time (time) datetime64[ns] 2022-10-24 ... 2022-10-24T02:00:00
Data variables:
0 (time) int64 0 1 2
1 (time) int64 1 2 3
2 (time) int64 2 3 4
3 (time) int64 3 4 5
4 (time) int64 4 5 6
5 (time) int64 5 6 7 and save that. |
Beta Was this translation helpful? Give feedback.
Given the structure you posted in #7202 (reply in thread) (mapping of timestep to row), you can translate to something
.from_dict
accepts with: