Skip to content

Commit 2e1f4df

Browse files
drvinceknightmarcharper
authored andcommitted
Re write the result set
At present the result set plays the matches in parallel followed by a (sometimes computationally expensive) single process of reading in and analysing the interactions. TLDR: This changes how the internal result set calculations are being done. They are more efficiently calculated. This happens by doing the following: 1. The various match by match calculations are done by the tournament (the winner of a match, the cooperation count, the score etc...). 2. This calculations are written to file (at present we just write the actual actions of an interaction). 3. The form of this file has also changed: for every match there are 2 rows. One row corresponds to each player. This might seem costly storage wise but is done to enable faster analysis (more on that later). 4. The analysis is now done entirely using `dask`: "Dask is a flexible parallel computing library for analytic computing." (https://dask.pydata.org/). This ensures all calculations are done on disk (so no huge memory problems) but also that they can be done **in parallel**. This is all done using the nice Pandas-like API that dask has so essentially all calculations for the result set are just done by a few `groupby` statements. 5. There is *some* work being done outside of dask but that's just reshaping data. `dask` outputs `pandas.Series` and to be consistent with our current setup these are changes to be lists of list etc... **Some user facing changes** (which is why I suggest this would be for a `v4.0.0` release): - The `result_set.interactions` is no longer possible. This is a choice and not forced by the redesign: I don't think this is ever necessary or helpful. The data file can always be read in. - The ability to `tournament.play(in_memory=True)` has been removed. Again, not entirely a forced change (although it would be a tiny bit of work to allow this). Given the recent work to make everything work on Windows I don't think this is necessary and has allowed for a big deletion of code (which is a good thing re maintenance). - The interactions data file is now in a different format, this is forced by the design choice. - I have made a slight modification to `result_set.cooperation`. Currently this sets all self interactions to be 0 but I think that's not the right way to display it (note that the cooperation rates were all being done correctly). **As well as ensuring the work done in series is reduced and the parallel workers also calculate the scores (which I think is more efficient)** this also seems to be faster: On this branch: ```python import axelrod as axl players = [s() for s in axl.strategies if s.classifier["memory_depth"] == 1] tournament = axl.Tournament(players, turns=200, repetitions=100) results = tournament.play(processes=4) ``` Took: 1min 49s ```python import axelrod as axl players = [s() for s in axl.short_run_time_strategies] tournament = axl.Tournament(players, turns=200, repetitions=20) results = tournament.play(processes=4) ``` Took: 21min 2s On current master; ```python import axelrod as axl players = [s() for s in axl.strategies if s.classifier["memory_depth"] == 1] tournament = axl.Tournament(players, turns=200, repetitions=100) results = tournament.play(processes=4) ``` Took: 2min 36s ```python import axelrod as axl players = [s() for s in axl.short_run_time_strategies] tournament = axl.Tournament(players, turns=200, repetitions=20) results = tournament.play(processes=4) ``` Took: 28min 8s **One final aspect to consider** (I think) is that adding `dask` as a dependency open up the potential to use it's `delayed` decorator to do all our parallel processing. This would have the benefit of being able to use a distributed scheduler that `dask` has. (We'd have to investigate if this actually works based on our parallelisation but at least the possibility is there).
1 parent a79ccab commit 2e1f4df

27 files changed

+1073
-1614
lines changed

.gitignore

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ cache.txt
44
test.csv
55
summary.csv
66
basic_tournament.csv
7-
test_outputs/*csv
7+
test_outputs/*csv.summary
88
test_outputs/*svg
99
test_outputs/*cache
1010

@@ -118,3 +118,12 @@ docker-compose.yml
118118

119119
# Mypy files
120120
.mypy_cache/
121+
122+
test_outputs/stochastic_tournament_0.csv
123+
test_outputs/stochastic_tournament_1.csv
124+
test_outputs/test_fingerprint.csv
125+
test_outputs/test_fingerprint_tmp.csv
126+
test_outputs/test_results_from_file.csv
127+
test_outputs/test_results_from_file_tmp.csv
128+
test_outputs/test_tournament.csv
129+
test_outputs/tran_fin.csv

axelrod/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
from .deterministic_cache import DeterministicCache
1919
from .match_generator import *
2020
from .tournament import Tournament
21-
from .result_set import ResultSet, ResultSetFromFile
21+
from .result_set import ResultSet
2222
from .ecosystem import Ecosystem
2323
from .fingerprint import AshlockFingerprint, TransitiveFingerprint
2424

axelrod/ecosystem.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ def __init__(self, results: ResultSet,
1212
population: List[int] = None) -> None:
1313

1414
self.results = results
15-
self.nplayers = self.results.nplayers
15+
self.num_players = self.results.num_players
1616
self.payoff_matrix = self.results.payoff_matrix
1717
self.payoff_stddevs = self.results.payoff_stddevs
1818

@@ -27,15 +27,15 @@ def __init__(self, results: ResultSet,
2727
if min(population) < 0:
2828
raise TypeError(
2929
"Minimum value of population vector must be non-negative")
30-
elif len(population) != self.nplayers:
30+
elif len(population) != self.num_players:
3131
raise TypeError(
3232
"Population vector must be same size as number of players")
3333
else:
3434
norm = sum(population)
3535
self.population_sizes = [[p / norm for p in population]]
3636
else:
3737
self.population_sizes = [
38-
[1 / self.nplayers for _ in range(self.nplayers)]]
38+
[1 / self.num_players for _ in range(self.num_players)]]
3939

4040
# This function is quite arbitrary and probably only influences the
4141
# kinetics for the current code.
@@ -47,7 +47,7 @@ def __init__(self, results: ResultSet,
4747
def reproduce(self, turns: int):
4848

4949
for iturn in range(turns):
50-
plist = list(range(self.nplayers))
50+
plist = list(range(self.num_players))
5151
pops = self.population_sizes[-1]
5252

5353
# The unit payoff for each player in this turn is the sum of the

axelrod/fingerprint.py

Lines changed: 19 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66
import matplotlib.pyplot as plt
77
import numpy as np
88
import tqdm
9+
import dask.dataframe as dd
10+
import dask as da
911
from mpl_toolkits.axes_grid1 import make_axes_locatable
1012

1113
import axelrod as axl
@@ -266,7 +268,7 @@ def construct_tournament_elements(self, step: float,
266268

267269
def fingerprint(
268270
self, turns: int = 50, repetitions: int = 10, step: float = 0.01,
269-
processes: int=None, filename: str = None, in_memory: bool = False,
271+
processes: int=None, filename: str = None,
270272
progress_bar: bool = True
271273
) -> dict:
272274
"""Build and play the spatial tournament.
@@ -290,10 +292,7 @@ def fingerprint(
290292
The number of processes to be used for parallel processing
291293
filename: str, optional
292294
The name of the file for self.spatial_tournament's interactions.
293-
if None and in_memory=False, will auto-generate a filename.
294-
in_memory: bool
295-
Whether self.spatial_tournament keeps interactions_dict in memory or
296-
in a file.
295+
if None, will auto-generate a filename.
297296
progress_bar : bool
298297
Whether or not to create a progress bar which will be updated
299298
@@ -305,7 +304,7 @@ def fingerprint(
305304
"""
306305

307306
temp_file_descriptor = None
308-
if not in_memory and filename is None:
307+
if filename is None:
309308
temp_file_descriptor, filename = mkstemp()
310309

311310
edges, tourn_players = self.construct_tournament_elements(
@@ -318,13 +317,10 @@ def fingerprint(
318317
self.spatial_tournament.play(build_results=False,
319318
filename=filename,
320319
processes=processes,
321-
in_memory=in_memory,
322320
progress_bar=progress_bar)
323-
if in_memory:
324-
self.interactions = self.spatial_tournament.interactions_dict
325-
else:
326-
self.interactions = read_interactions_from_file(
327-
filename, progress_bar=progress_bar)
321+
322+
self.interactions = read_interactions_from_file(
323+
filename, progress_bar=progress_bar)
328324

329325
if temp_file_descriptor is not None:
330326
os.close(temp_file_descriptor)
@@ -483,17 +479,19 @@ def analyse_cooperation_ratio(filename):
483479
opponent in each turn. The ith row corresponds to the ith opponent
484480
and the jth column the jth turn.
485481
"""
486-
did_c = np.vectorize(lambda action: int(action == 'C'))
482+
did_c = np.vectorize(lambda actions: [int(action == 'C')
483+
for action in actions])
487484

488485
cooperation_rates = {}
489-
with open(filename, "r") as f:
490-
reader = csv.reader(f)
491-
for row in reader:
492-
opponent_index, player_history = int(row[1]), list(row[4])
493-
if opponent_index in cooperation_rates:
494-
cooperation_rates[opponent_index].append(did_c(player_history))
495-
else:
496-
cooperation_rates[opponent_index] = [did_c(player_history)]
486+
df = dd.read_csv(filename)
487+
df = df[df["Player index"] == 0][["Opponent index", "Actions"]]
488+
489+
for _, row in df.iterrows():
490+
opponent_index, player_history = row["Opponent index"], row["Actions"]
491+
if opponent_index in cooperation_rates:
492+
cooperation_rates[opponent_index].append(did_c(player_history))
493+
else:
494+
cooperation_rates[opponent_index] = [did_c(player_history)]
497495

498496
for index, rates in cooperation_rates.items():
499497
cooperation_rates[index] = np.mean(rates, axis=0)

axelrod/interaction_utils.py

Lines changed: 15 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
from collections import Counter
1111
import csv
1212
import tqdm
13+
import pandas as pd
1314

1415
from axelrod.action import Action, str_to_actions
1516
from .game import Game
@@ -239,36 +240,27 @@ def compute_sparklines(interactions, c_symbol='█', d_symbol=' '):
239240
sparkline(histories[1], c_symbol, d_symbol))
240241

241242

242-
def read_interactions_from_file(filename, progress_bar=True,
243-
num_interactions=False):
243+
def read_interactions_from_file(filename,
244+
progress_bar=True,
245+
):
244246
"""
245247
Reads a file and returns a dictionary mapping tuples of player pairs to
246248
lists of interactions
247249
"""
250+
df = pd.read_csv(filename)[["Interaction index", "Player index",
251+
"Opponent index", "Actions"]]
252+
groupby = df.groupby("Interaction index")
248253
if progress_bar:
249-
if not num_interactions:
250-
with open(filename) as f:
251-
num_interactions = sum(1 for line in f)
252-
progress_bar = tqdm.tqdm(total=num_interactions, desc="Loading")
254+
groupby = tqdm.tqdm(groupby)
253255

254256
pairs_to_interactions = {}
255-
with open(filename, 'r') as f:
256-
for row in csv.reader(f):
257-
index_pair = (int(row[0]), int(row[1]))
258-
p1_actions = str_to_actions(row[4])
259-
p2_actions = str_to_actions(row[5])
260-
interaction = list(zip(p1_actions, p2_actions))
261-
262-
try:
263-
pairs_to_interactions[index_pair].append(interaction)
264-
except KeyError:
265-
pairs_to_interactions[index_pair] = [interaction]
266-
267-
if progress_bar:
268-
progress_bar.update()
269-
270-
if progress_bar:
271-
progress_bar.close()
257+
for _, d in tqdm.tqdm(groupby):
258+
key = tuple(d[["Player index", "Opponent index"]].iloc[0])
259+
value = list(map(str_to_actions, zip(*d["Actions"])))
260+
try:
261+
pairs_to_interactions[key].append(value)
262+
except KeyError:
263+
pairs_to_interactions[key] = [value]
272264
return pairs_to_interactions
273265

274266

axelrod/plot.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ def default_cmap(version: str = "2.0") -> str:
2525
class Plot(object):
2626
def __init__(self, result_set: ResultSet) -> None:
2727
self.result_set = result_set
28-
self.nplayers = self.result_set.nplayers
28+
self.num_players = self.result_set.num_players
2929
self.players = self.result_set.players
3030

3131
def _violinplot(
@@ -40,16 +40,16 @@ def _violinplot(
4040
ax = ax
4141

4242
figure = ax.get_figure()
43-
width = max(self.nplayers / 3, 12)
43+
width = max(self.num_players / 3, 12)
4444
height = width / 2
4545
spacing = 4
46-
positions = spacing * arange(1, self.nplayers + 1, 1)
46+
positions = spacing * arange(1, self.num_players + 1, 1)
4747
figure.set_size_inches(width, height)
4848
ax.violinplot(data, positions=positions, widths=spacing / 2,
4949
showmedians=True, showextrema=False)
5050
ax.set_xticks(positions)
5151
ax.set_xticklabels(names, rotation=90)
52-
ax.set_xlim([0, spacing * (self.nplayers + 1)])
52+
ax.set_xlim([0, spacing * (self.num_players + 1)])
5353
ax.tick_params(axis='both', which='both', labelsize=8)
5454
if title:
5555
ax.set_title(title)
@@ -175,14 +175,14 @@ def _payoff_heatmap(
175175
ax = ax
176176

177177
figure = ax.get_figure()
178-
width = max(self.nplayers / 4, 12)
178+
width = max(self.num_players / 4, 12)
179179
height = width
180180
figure.set_size_inches(width, height)
181181
matplotlib_version = matplotlib.__version__
182182
cmap = default_cmap(matplotlib_version)
183183
mat = ax.matshow(data, cmap=cmap)
184-
ax.set_xticks(range(self.result_set.nplayers))
185-
ax.set_yticks(range(self.result_set.nplayers))
184+
ax.set_xticks(range(self.result_set.num_players))
185+
ax.set_yticks(range(self.result_set.num_players))
186186
ax.set_xticklabels(names, rotation=90)
187187
ax.set_yticklabels(names)
188188
ax.tick_params(axis='both', which='both', labelsize=16)
@@ -246,7 +246,7 @@ def stackplot(
246246
ticks = []
247247
for i, n in enumerate(self.result_set.ranked_names):
248248
x = -0.01
249-
y = (i + 0.5) * 1 / self.result_set.nplayers
249+
y = (i + 0.5) * 1 / self.result_set.num_players
250250
ax.annotate(
251251
n, xy=(x, y), xycoords=trans, clip_on=False, va='center',
252252
ha='right', fontsize=5)

0 commit comments

Comments
 (0)