The module dataprovider can be used independently from the rest of this repository. It provides a convenient way to read the Matches.csv file partially into memory, as needed.
To use the dataprovider module, you may need about 11 GB of free disk space at some point:
- ~3.9 GB to store the Matches.csv file
- ~5 GB initially to store the temporary data while converting the Matches.csv to a Matches.npz
- ~1.3 GB to store the Matches.npz)
You need to have one data directory anywhere on the PC and in it there needs to be this file structure (basically just a git clone
of the data git repository plus the Matches.csv file):
- champion_names.csv
- spell_names.csv
- Matches.csv
- columns
- interesting
- interesting.csv
- known
- unknown
import numpy as np
import dataprovider
# Assuming the _data_ repository was cloned to `C:\\Path\\to\\data\\`.
# get the python data matrix ready to go
data = dataprovider.CorpusProvider("C:\\Path\\to\\data\\", np.dtype(np.float32))
# get the 'interesting' data without 'win'
interesting_data = data.interesting_without_win
# only use the first half of that 'interesting - win' data
interesting_first_half = np.array_split(interesting_data, 2)[0]
# other available data partitions are these:
partitions = [
data.known,
data.unknown,
data.unknown_without_win,
data.interesting,
data.interesting_without_win,
]
# let's print the shapes:
partition_names = ("known", "unknown", "unknown_without_win", "interesting", "interesting_without_win")
print("\n".join("{n} with shape {p.shape!r}".format(p=p, n=n) for p, n in zip(partitions, partition_names)))
Documentation not yet done...