Reading ARFF datasets using Deep Java Library (DJL).
Rather than explicitly defining all the features and labels manually, the dataset builder offers a number of methods that simplify specifying which columns are to be used as labels (i.e., class attributes/output variables) and which as features (i.e., input variables). It is also possible to specify columns to ignore completely.
Works with DJL version 0.21.0 and later.
Below is an example of how to load the UCI datatset iris, using the last column
as class attribute and only features that match petal.*
:
import nz.ac.waikato.cms.adams.djl.dataset.ArffDataset;
import java.nio.file.Path;
ArffDataset dataset = ArffDataset.builder()
.optArffFile(Path.of("src/main/resources/iris.arff"))
.setSampling(32, true)
.classIsLast()
.addMatchingFeatures("petal.*")
.build();
Here is an overview of the available ArffDataset.ArffBuilder
methods:
dateColumnsAsNumeric()
- treatDATE
attributes asNUMERIC
instead of ignoring themstringColumnsAsNominal()
- treatSTRING
attributes asNOMINAL
instead of ignoring themclassIndex(int...)
- sets the 0-based index/indices of the column(s) to use as class attribute(s)classIsFirst()
- uses the first column as class attributeclassIsLast()
- uses the last column as class attributeaddClassColumn(String...)
- adds the specified column(s) as class attribute(s)addIgnoredColumn(String...)
- specifies column(s) to be ignoredignoreMatchingColumns(String...)
- ignores columns that match the regexp(s)addAllFeatures()
- adds all columns as features that are neither ignored nor class attributesaddMatchingFeatures(String...)
- adds all columns that match the regexp(s) that are neither ignored nor class attributesoptArffFile(Path)
- the file to the ARFF file to loadoptArffUrl(String)
- the URL of the ARFF file to loadfromJson
- can instantiate the builder from the JSON settings (as provided byArffDataset.toJson
)
Either method of the builder instance must be called: *
optArffFile
optArffUrl
fromJson
Some example classes for loading ARFF files:
- Load airline dataset
- Load bodyfat dataset (adding columns automatically)
- Load bodyfat dataset (explicitly adding columns)
- Load iris dataset
- Load iris dataset (STRING class attribute)
Add the following dependency to your pom.xml
:
<dependency>
<groupId>nz.ac.waikato.cms.adams</groupId>
<artifactId>djl-arff</artifactId>
<version>0.0.2</version>
</dependency>