File tree Expand file tree Collapse file tree 2 files changed +6
-0
lines changed Expand file tree Collapse file tree 2 files changed +6
-0
lines changed Original file line number Diff line number Diff line change @@ -274,6 +274,11 @@ Changelog
274
274
- |Fix | :func: `datasets.fetch_openml ` returns improved data types when
275
275
`as_frame=True ` and `parser="liac-arff" `. :pr: `26386 ` by `Thomas Fan `_.
276
276
277
+ - |Fix | Following the ARFF specs, only the marker `"?" ` is now considered as a missing
278
+ values when opening ARFF files fetched using :func: `datasets.fetch_openml ` when using
279
+ the pandas parser. The parameter `read_csv_kwargs ` allows to overwrite this behaviour.
280
+ :pr: `26551 ` by :user: `Guillaume Lemaitre <glemaitre> `.
281
+
277
282
- |Enhancement | Allows to overwrite the parameters used to open the ARFF file using
278
283
the parameter `read_csv_kwargs ` in :func: `datasets.fetch_openml ` when using the
279
284
pandas parser.
Original file line number Diff line number Diff line change @@ -387,6 +387,7 @@ def _pandas_arff_parser(
387
387
"header" : None ,
388
388
"index_col" : False , # always force pandas to not use the first column as index
389
389
"na_values" : ["?" ], # missing values are represented by `?`
390
+ "keep_default_na" : False , # only `?` is a missing value given the ARFF specs
390
391
"comment" : "%" , # skip line starting by `%` since they are comments
391
392
"quotechar" : '"' , # delimiter to use for quoted strings
392
393
"skipinitialspace" : True , # skip spaces after delimiter to follow ARFF specs
You can’t perform that action at this time.
0 commit comments