Intake-esm polars' implementation incompatible with xscen's CSVs

Since last April, `intake-esm` has implemented reading csv files with `polars`. Which is pretty cool and should help for large ones like MRCC-disponible. However, `polars.scan_csv` does not have the same arguments as `pd.read_csv` and our built-in `read_csv_kwargs` break the new version.

This means the following

### "variable" column

We use the read kwargs to convert variable tuple reps (i.e. `"('pr',)"`) to actual tuples. `intake-esm` already has a trick for that in the `columns_with_iterables=["variable"]` init arg, which we could use. However, the `polars` implementation expects _lists_ instead of tuples. So `['pr']` instead of `"('pr',)"`. Which means we need to reformat _all_ previously existing catalog before fixing this and upgrading `intake-esm`.

Sadly, this has the consequence of storing the variables in lists, which are not hashable. This trigger errors in the test suite, where we look for "duplicates" for example (`search_data_catalogs`). I am looking into that.

### xrfreq
The read kwargs also convert old `xrfreq` to the "new" ones. 

I say that time has passed and we can simply drop that. "old" xrfreq will trigger errors and that's ok.

 
### dtypes
The kwargs ensure `string[pyarrow]` as the dtype for the "path" column and "category" for the other string columns. We could change the way we specify those to match what polars want or simply test the performance without. Maybe this is not needed anymore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Intake-esm polars' implementation incompatible with xscen's CSVs #618

"variable" column

xrfreq

dtypes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Intake-esm polars' implementation incompatible with xscen's CSVs #618

Description

"variable" column

xrfreq

dtypes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions