Skip to content

Commit 068b57a

Browse files
authored
Merge pull request #28 from jajupmochi/v0.2
V0.2
2 parents be4f169 + 7f66196 commit 068b57a

27 files changed

+599
-82
lines changed

.appveyor.yml

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
environment:
22
matrix:
3-
- PYTHON: "C:\\Python35"
4-
- PYTHON: "C:\\Python35-x64"
53
- PYTHON: "C:\\Python36"
64
- PYTHON: "C:\\Python36-x64"
75
- PYTHON: "C:\\Python37"
@@ -17,12 +15,12 @@ environment:
1715

1816
install:
1917
- "%PYTHON%\\python.exe -m pip install -U pip"
20-
- "%PYTHON%\\python.exe -m pip install -U pytest"
21-
- "%PYTHON%\\python.exe -m pip install -r requirements.txt"
2218
- "%PYTHON%\\python.exe -m pip install wheel"
19+
- "%PYTHON%\\python.exe -m pip install -r requirements.txt"
20+
- "%PYTHON%\\python.exe -m pip install -U pytest"
2321

2422
build: off
2523

2624
test_script:
2725
- "%PYTHON%\\python.exe setup.py bdist_wheel"
28-
- "%PYTHON%\\python.exe -m pytest -v gklearn/tests/"
26+
- "%PYTHON%\\python.exe -m pytest -v gklearn/tests/ --ignore=gklearn/tests/test_median_preimage_generator.py"

.travis.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
language: python
22

33
python:
4-
- '3.5'
54
- '3.6'
65
- '3.7'
76
- '3.8'

README.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ A Python package for graph kernels, graph edit distances and graph pre-image pro
99

1010
## Requirements
1111

12-
* python>=3.5
12+
* python>=3.6
1313
* numpy>=1.16.2
1414
* scipy>=1.1.0
1515
* matplotlib>=3.1.0
@@ -65,27 +65,27 @@ The docs of the library can be found [here](https://graphkit-learn.readthedocs.i
6565
### 1 List of graph kernels
6666

6767
* Based on walks
68-
* [The common walk kernel](gklearn/kernels/common_walk.py) [1]
68+
* [The common walk kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/common_walk.py) [1]
6969
* Exponential
7070
* Geometric
71-
* [The marginalized kenrel](gklearn/kernels/marginalized.py)
71+
* [The marginalized kenrel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/marginalized.py)
7272
* With tottering [2]
7373
* Without tottering [7]
74-
* [The generalized random walk kernel](gklearn/kernels/random_walk.py) [3]
75-
* [Sylvester equation](gklearn/kernels/sylvester_equation.py)
74+
* [The generalized random walk kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/random_walk.py) [3]
75+
* [Sylvester equation](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/sylvester_equation.py)
7676
* Conjugate gradient
7777
* Fixed-point iterations
78-
* [Spectral decomposition](gklearn/kernels/spectral_decomposition.py)
78+
* [Spectral decomposition](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/spectral_decomposition.py)
7979
* Based on paths
80-
* [The shortest path kernel](gklearn/kernels/shortest_path.py) [4]
81-
* [The structural shortest path kernel](gklearn/kernels/structural_sp.py) [5]
82-
* [The path kernel up to length h](gklearn/kernels/path_up_to_h.py) [6]
80+
* [The shortest path kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/shortest_path.py) [4]
81+
* [The structural shortest path kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/structural_sp.py) [5]
82+
* [The path kernel up to length h](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/path_up_to_h.py) [6]
8383
* The Tanimoto kernel
8484
* The MinMax kernel
8585
* Non-linear kernels
86-
* [The treelet kernel](gklearn/kernels/treelet.py) [10]
87-
* [Weisfeiler-Lehman kernel](gklearn/kernels/weisfeiler_lehman.py) [11]
88-
* [Subtree](gklearn/kernels/weisfeiler_lehman.py#L479)
86+
* [The treelet kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/treelet.py) [10]
87+
* [Weisfeiler-Lehman kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/weisfeiler_lehman.py) [11]
88+
* [Subtree](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/weisfeiler_lehman.py#L479)
8989

9090
A demo of computing graph kernels can be found on [Google Colab](https://colab.research.google.com/drive/17Q2QCl9CAtDweGF8LiWnWoN2laeJqT0u?usp=sharing) and in the [`examples`](https://github.com/jajupmochi/graphkit-learn/blob/master/gklearn/examples/compute_graph_kernel.py) folder.
9191

@@ -97,7 +97,7 @@ A demo of generating graph preimages can be found on [Google Colab](https://cola
9797

9898
### 4 Interface to `GEDLIB`
9999

100-
[`GEDLIB`](https://github.com/dbblumenthal/gedlib) is an easily extensible C++ library for (suboptimally) computing the graph edit distance between attributed graphs. [A Python interface](gklearn/gedlib) for `GEDLIB` is integrated in this library, based on [`gedlibpy`](https://github.com/Ryurin/gedlibpy) library.
100+
[`GEDLIB`](https://github.com/dbblumenthal/gedlib) is an easily extensible C++ library for (suboptimally) computing the graph edit distance between attributed graphs. [A Python interface](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/gedlib) for `GEDLIB` is integrated in this library, based on [`gedlibpy`](https://github.com/Ryurin/gedlibpy) library.
101101

102102
### 5 Computation optimization methods
103103

gklearn/examples/ged/__init__.py

Whitespace-only changes.

gklearn/examples/kernels/__init__.py

Whitespace-only changes.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# -*- coding: utf-8 -*-
2+
"""compute_graph_kernel_v0.1.ipynb
3+
4+
Automatically generated by Colaboratory.
5+
6+
Original file is located at
7+
https://colab.research.google.com/drive/10jUz7-ahPiE_T1qvFrh2NvCVs1e47noj
8+
9+
**This script demonstrates how to compute a graph kernel.**
10+
---
11+
12+
**0. Install `graphkit-learn`.**
13+
"""
14+
15+
"""**1. Get dataset.**"""
16+
17+
from gklearn.utils.graphfiles import loadDataset
18+
19+
graphs, targets = loadDataset('../../../datasets/MUTAG/MUTAG_A.txt')
20+
21+
"""**2. Compute graph kernel.**"""
22+
23+
from gklearn.kernels import untilhpathkernel
24+
25+
gram_matrix, run_time = untilhpathkernel(
26+
graphs, # The list of input graphs.
27+
depth=5, # The longest length of paths.
28+
k_func='MinMax', # Or 'tanimoto'.
29+
compute_method='trie', # Or 'naive'.
30+
n_jobs=1, # The number of jobs to run in parallel.
31+
verbose=True)
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# -*- coding: utf-8 -*-
2+
"""model_selection_old.ipynb
3+
4+
Automatically generated by Colaboratory.
5+
6+
Original file is located at
7+
https://colab.research.google.com/drive/1uVkl7scNgEPrimX8ks6iEC5ijuhB8L_D
8+
9+
**This script demonstrates how to compute a graph kernel.**
10+
---
11+
12+
**0. Install `graphkit-learn`.**
13+
"""
14+
15+
"""**1. Perform model seletion and classification.**"""
16+
17+
from gklearn.utils import model_selection_for_precomputed_kernel
18+
from gklearn.kernels import untilhpathkernel
19+
import numpy as np
20+
21+
# Set parameters.
22+
datafile = '../../../datasets/MUTAG/MUTAG_A.txt'
23+
param_grid_precomputed = {'depth': np.linspace(1, 10, 10),
24+
'k_func': ['MinMax', 'tanimoto'],
25+
'compute_method': ['trie']}
26+
param_grid = {'C': np.logspace(-10, 10, num=41, base=10)}
27+
28+
# Perform model selection and classification.
29+
model_selection_for_precomputed_kernel(
30+
datafile, # The path of dataset file.
31+
untilhpathkernel, # The graph kernel used for estimation.
32+
param_grid_precomputed, # The parameters used to compute gram matrices.
33+
param_grid, # The penelty Parameters used for penelty items.
34+
'classification', # Or 'regression'.
35+
NUM_TRIALS=30, # The number of the random trials of the outer CV loop.
36+
ds_name='MUTAG', # The name of the dataset.
37+
n_jobs=1,
38+
verbose=True)

gklearn/examples/preimage/__init__.py

Whitespace-only changes.
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
#!/usr/bin/env python3
2+
# -*- coding: utf-8 -*-
3+
"""
4+
Created on Tue Jun 16 15:41:26 2020
5+
6+
@author: ljia
7+
8+
**This script demonstrates how to generate a graph preimage using Boria's method with cost matrices learning.**
9+
"""
10+
11+
"""**1. Get dataset.**"""
12+
13+
from gklearn.utils import Dataset, split_dataset_by_target
14+
15+
# Predefined dataset name, use dataset "MAO".
16+
ds_name = 'MAO'
17+
# The node/edge labels that will not be used in the computation.
18+
irrelevant_labels = {'node_attrs': ['x', 'y', 'z'], 'edge_labels': ['bond_stereo']}
19+
20+
# Initialize a Dataset.
21+
dataset_all = Dataset()
22+
# Load predefined dataset "MAO".
23+
dataset_all.load_predefined_dataset(ds_name)
24+
# Remove irrelevant labels.
25+
dataset_all.remove_labels(**irrelevant_labels)
26+
# Split the whole dataset according to the classification targets.
27+
datasets = split_dataset_by_target(dataset_all)
28+
# Get the first class of graphs, whose median preimage will be computed.
29+
dataset = datasets[0]
30+
len(dataset.graphs)
31+
32+
"""**2. Set parameters.**"""
33+
34+
import multiprocessing
35+
36+
# Parameters for MedianPreimageGenerator (our method).
37+
mpg_options = {'init_method': 'random', # how to initialize node label cost vector. "random" means to initialize randomly.
38+
'init_ecc': [4, 4, 2, 1, 1, 1], # initial edit costs.
39+
'ds_name': ds_name, # name of the dataset.
40+
'parallel': True, # @todo: whether the parallel scheme is to be used.
41+
'time_limit_in_sec': 0, # maximum time limit to compute the preimage. If set to 0 then no limit.
42+
'max_itrs': 3, # maximum iteration limit to optimize edit costs. If set to 0 then no limit.
43+
'max_itrs_without_update': 3, # If the times that edit costs is not update is more than this number, then the optimization stops.
44+
'epsilon_residual': 0.01, # In optimization, the residual is only considered changed if the change is bigger than this number.
45+
'epsilon_ec': 0.1, # In optimization, the edit costs are only considered changed if the changes are bigger than this number.
46+
'verbose': 2 # whether to print out results.
47+
}
48+
# Parameters for graph kernel computation.
49+
kernel_options = {'name': 'PathUpToH', # use path kernel up to length h.
50+
'depth': 9,
51+
'k_func': 'MinMax',
52+
'compute_method': 'trie',
53+
'parallel': 'imap_unordered', # or None
54+
'n_jobs': multiprocessing.cpu_count(),
55+
'normalize': True, # whether to use normalized Gram matrix to optimize edit costs.
56+
'verbose': 2 # whether to print out results.
57+
}
58+
# Parameters for GED computation.
59+
ged_options = {'method': 'BIPARTITE', # use Bipartite huristic.
60+
'initialization_method': 'RANDOM', # or 'NODE', etc.
61+
'initial_solutions': 10, # when bigger than 1, then the method is considered mIPFP.
62+
'edit_cost': 'CONSTANT', # @todo: not needed. use CONSTANT cost.
63+
'attr_distance': 'euclidean', # @todo: not needed. the distance between non-symbolic node/edge labels is computed by euclidean distance.
64+
'ratio_runs_from_initial_solutions': 1,
65+
'threads': multiprocessing.cpu_count(), # parallel threads. Do not work if mpg_options['parallel'] = False.
66+
'init_option': 'LAZY_WITHOUT_SHUFFLED_COPIES' # 'EAGER_WITHOUT_SHUFFLED_COPIES'
67+
}
68+
# Parameters for MedianGraphEstimator (Boria's method).
69+
mge_options = {'init_type': 'MEDOID', # how to initial median (compute set-median). "MEDOID" is to use the graph with smallest SOD.
70+
'random_inits': 10, # number of random initialization when 'init_type' = 'RANDOM'.
71+
'time_limit': 600, # maximum time limit to compute the generalized median. If set to 0 then no limit.
72+
'verbose': 2, # whether to print out results.
73+
'refine': False # whether to refine the final SODs or not.
74+
}
75+
print('done.')
76+
77+
"""**3. Run median preimage generator.**"""
78+
79+
from gklearn.preimage import MedianPreimageGeneratorCML
80+
81+
# Create median preimage generator instance.
82+
mpg = MedianPreimageGeneratorCML()
83+
# Add dataset.
84+
mpg.dataset = dataset
85+
# Set parameters.
86+
mpg.set_options(**mpg_options.copy())
87+
mpg.kernel_options = kernel_options.copy()
88+
mpg.ged_options = ged_options.copy()
89+
mpg.mge_options = mge_options.copy()
90+
# Run.
91+
mpg.run()
92+
93+
"""**4. Get results.**"""
94+
95+
# Get results.
96+
import pprint
97+
pp = pprint.PrettyPrinter(indent=4) # pretty print
98+
results = mpg.get_results()
99+
pp.pprint(results)
100+
101+
# Draw generated graphs.
102+
def draw_graph(graph):
103+
import matplotlib.pyplot as plt
104+
import networkx as nx
105+
plt.figure()
106+
pos = nx.spring_layout(graph)
107+
nx.draw(graph, pos, node_size=500, labels=nx.get_node_attributes(graph, 'atom_symbol'), font_color='w', width=3, with_labels=True)
108+
plt.show()
109+
plt.clf()
110+
plt.close()
111+
112+
draw_graph(mpg.set_median)
113+
draw_graph(mpg.gen_median)
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
#!/usr/bin/env python3
2+
# -*- coding: utf-8 -*-
3+
"""
4+
Created on Tue Jun 16 15:41:26 2020
5+
6+
@author: ljia
7+
8+
**This script demonstrates how to generate a graph preimage using Boria's method with cost matrices learning.**
9+
"""
10+
11+
"""**1. Get dataset.**"""
12+
13+
from gklearn.utils import Dataset, split_dataset_by_target
14+
15+
# Predefined dataset name, use dataset "MAO".
16+
ds_name = 'MAO'
17+
# The node/edge labels that will not be used in the computation.
18+
irrelevant_labels = {'node_attrs': ['x', 'y', 'z'], 'edge_labels': ['bond_stereo']}
19+
20+
# Initialize a Dataset.
21+
dataset_all = Dataset()
22+
# Load predefined dataset "MAO".
23+
dataset_all.load_predefined_dataset(ds_name)
24+
# Remove irrelevant labels.
25+
dataset_all.remove_labels(**irrelevant_labels)
26+
# Split the whole dataset according to the classification targets.
27+
datasets = split_dataset_by_target(dataset_all)
28+
# Get the first class of graphs, whose median preimage will be computed.
29+
dataset = datasets[0]
30+
# dataset.cut_graphs(range(0, 10))
31+
len(dataset.graphs)
32+
33+
"""**2. Set parameters.**"""
34+
35+
import multiprocessing
36+
37+
# Parameters for MedianPreimageGenerator (our method).
38+
mpg_options = {'fit_method': 'k-graphs', # how to fit edit costs. "k-graphs" means use all graphs in median set when fitting.
39+
'init_ecc': [4, 4, 2, 1, 1, 1], # initial edit costs.
40+
'ds_name': ds_name, # name of the dataset.
41+
'parallel': True, # @todo: whether the parallel scheme is to be used.
42+
'time_limit_in_sec': 0, # maximum time limit to compute the preimage. If set to 0 then no limit.
43+
'max_itrs': 100, # maximum iteration limit to optimize edit costs. If set to 0 then no limit.
44+
'max_itrs_without_update': 3, # If the times that edit costs is not update is more than this number, then the optimization stops.
45+
'epsilon_residual': 0.01, # In optimization, the residual is only considered changed if the change is bigger than this number.
46+
'epsilon_ec': 0.1, # In optimization, the edit costs are only considered changed if the changes are bigger than this number.
47+
'verbose': 2 # whether to print out results.
48+
}
49+
# Parameters for graph kernel computation.
50+
kernel_options = {'name': 'PathUpToH', # use path kernel up to length h.
51+
'depth': 9,
52+
'k_func': 'MinMax',
53+
'compute_method': 'trie',
54+
'parallel': 'imap_unordered', # or None
55+
'n_jobs': multiprocessing.cpu_count(),
56+
'normalize': True, # whether to use normalized Gram matrix to optimize edit costs.
57+
'verbose': 2 # whether to print out results.
58+
}
59+
# Parameters for GED computation.
60+
ged_options = {'method': 'BIPARTITE', # use Bipartite huristic.
61+
'initialization_method': 'RANDOM', # or 'NODE', etc.
62+
'initial_solutions': 10, # when bigger than 1, then the method is considered mIPFP.
63+
'edit_cost': 'CONSTANT', # use CONSTANT cost.
64+
'attr_distance': 'euclidean', # the distance between non-symbolic node/edge labels is computed by euclidean distance.
65+
'ratio_runs_from_initial_solutions': 1,
66+
'threads': multiprocessing.cpu_count(), # parallel threads. Do not work if mpg_options['parallel'] = False.
67+
'init_option': 'LAZY_WITHOUT_SHUFFLED_COPIES' # 'EAGER_WITHOUT_SHUFFLED_COPIES'
68+
}
69+
# Parameters for MedianGraphEstimator (Boria's method).
70+
mge_options = {'init_type': 'MEDOID', # how to initial median (compute set-median). "MEDOID" is to use the graph with smallest SOD.
71+
'random_inits': 10, # number of random initialization when 'init_type' = 'RANDOM'.
72+
'time_limit': 600, # maximum time limit to compute the generalized median. If set to 0 then no limit.
73+
'verbose': 2, # whether to print out results.
74+
'refine': False # whether to refine the final SODs or not.
75+
}
76+
print('done.')
77+
78+
"""**3. Run median preimage generator.**"""
79+
80+
from gklearn.preimage import MedianPreimageGeneratorPy
81+
82+
# Create median preimage generator instance.
83+
mpg = MedianPreimageGeneratorPy()
84+
# Add dataset.
85+
mpg.dataset = dataset
86+
# Set parameters.
87+
mpg.set_options(**mpg_options.copy())
88+
mpg.kernel_options = kernel_options.copy()
89+
mpg.ged_options = ged_options.copy()
90+
mpg.mge_options = mge_options.copy()
91+
# Run.
92+
mpg.run()
93+
94+
"""**4. Get results.**"""
95+
96+
# Get results.
97+
import pprint
98+
pp = pprint.PrettyPrinter(indent=4) # pretty print
99+
results = mpg.get_results()
100+
pp.pprint(results)
101+
102+
# Draw generated graphs.
103+
def draw_graph(graph):
104+
import matplotlib.pyplot as plt
105+
import networkx as nx
106+
plt.figure()
107+
pos = nx.spring_layout(graph)
108+
nx.draw(graph, pos, node_size=500, labels=nx.get_node_attributes(graph, 'atom_symbol'), font_color='w', width=3, with_labels=True)
109+
plt.show()
110+
plt.clf()
111+
plt.close()
112+
113+
draw_graph(mpg.set_median)
114+
draw_graph(mpg.gen_median)

0 commit comments

Comments
 (0)