Skip to content

Commit b8ba42b

Browse files
committed
rename & update readme
1 parent abcf6f6 commit b8ba42b

14 files changed

+93
-19
lines changed

Cargo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[package]
2-
name = "cleora-python"
2+
name = "pycleora"
33
version = "2.0.0"
44
edition = "2018"
55
license-file = "LICENSE"
@@ -8,7 +8,7 @@ documentation = "https://github.com/synerise/cleora"
88
homepage = "https://github.com/synerise/cleora"
99
repository = "https://github.com/synerise/cleora"
1010
description = """
11-
Sparse graph structure and markov-propagation on embeddings exposed via python bindings
11+
Sparse hypergraph structure and markov-propagation for node embeddings embeddings exposed via Python bindings.
1212
"""
1313

1414
[lib]

README.md

Lines changed: 75 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,81 @@ _**Cleora** is a genus of moths in the family **Geometridae**. Their scientific
2121

2222
Cleora is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.
2323

24+
**Cleora** is now available as a python package _pycleora_. Key improvements compared to the previous version:
25+
* _performance optimizations_: 10x faster embedding times
26+
* _performance optimizations_: reduced memory usage
27+
* _latest research_: significantly improved embedding quality
28+
* _new feature_: can create graphs from a Python iterator in addition to tsv files
29+
* _new feature_: seamless integration with _NumPy_
30+
* _new feature_: item attributes support via custom embeddings initialization
31+
* _new feature_: adjustable vector projection / normalization after each propagation step
32+
33+
**Breaking changes:**
34+
* _transient_ modifier not supported any more - creating _complex::reflexive_ columns for hypergraph embeddings, grouped by the transient entity gives better results.
35+
36+
37+
**Example usage:**
38+
39+
```
40+
import pycleora
41+
import numpy as np
42+
import pandas as pd
43+
import random
44+
45+
# Generate example data
46+
customers = [f"Customer_{i}" for i in range(1, 20)]
47+
products = [f"Product_{j}" for j in range(1, 20)]
48+
49+
data = {
50+
"customer": random.choices(customers, k=100),
51+
"product": random.choices(products, k=100),
52+
}
53+
54+
# Create DataFrame
55+
df = pd.DataFrame(data)
56+
57+
# Create hyperedges
58+
customer_products = df.groupby('customer')['product'].apply(list).values
59+
60+
# Convert to Cleora input format
61+
cleora_input = map(lambda x: ' '.join(x), customer_products)
62+
63+
# Create Markov transition matrix for the hypergraph
64+
mat = pycleora.SparseMatrix.from_iterator(cleora_input, columns='complex::reflexive::product')
65+
66+
# Look at entity ids in the matrix, corresponding to embedding vectors
67+
print(mat.entity_ids)
68+
# ['Product_5', 'Product_3', 'Product_2', 'Product_4', 'Product_1']
69+
70+
# Initialize embedding vectors externally, using text, image, random vectors
71+
# embeddings = ...
72+
73+
# Or use built-in random deterministic initialization
74+
embeddings = mat.initialize_deterministically(1024)
75+
76+
# Perform Markov random walk, then normalize however many times we want
77+
78+
NUM_WALKS = 3 # The optimal number depends on the graph, typically between 3 and 7 yields good results
79+
# lower values tend to capture co-occurrence, higher iterations capture substitutability in a context
80+
81+
for i in range(NUM_WALKS):
82+
# Can propagate with a symmetric matrix as well, but left Markov is a great default
83+
embeddings = mat.left_markov_propagate(embeddings)
84+
# Normalize with L2 norm by default, for the embeddings to reside on a hypersphere. Can use standardization instead.
85+
embeddings /= np.linalg.norm(embeddings, ord=2, axis=-1, keepdims=True)
86+
87+
# We're done, here are our embeddings
88+
89+
for entity, embedding in zip(mat.entity_ids, embeddings):
90+
print(entity, embedding)
91+
92+
# We can now compare our embeddings with dot product (since they are L2 normalized)
93+
94+
print(np.dot(embeddings[0], embeddings[1]))
95+
print(np.dot(embeddings[0], embeddings[2]))
96+
print(np.dot(embeddings[0], embeddings[3]))
97+
```
98+
2499
**Read the whitepaper ["Cleora: A Simple, Strong and Scalable Graph Embedding Scheme"](https://arxiv.org/abs/2102.02302)**
25100

26101
Cleora embeds entities in *n-dimensional spherical spaces* utilizing extremely fast stable, iterative random projections, which allows for unparalleled performance and scalability.
@@ -166,14 +241,6 @@ The technical properties described above imply good production-readiness of Cleo
166241

167242
More information can be found in [the full documentation](https://cleora.readthedocs.io/).
168243

169-
## Cleora Enterprise
170-
**Cleora Enterprise** is now available for selected customers. Key improvements in addition to this open-source version:
171-
* _performance optimizations_: 10x faster embedding times
172-
* _latest research_: significantly improved embedding quality
173-
* _new feature_: item attributes support
174-
* _new feature_: multimodal fusion of multiple graphs, text and image embeddings
175-
* _new feature_: compressed embeddings in various formats (spherical, hyperbolic, sparse)
176-
177244
For details contact us at cleora@synerise.com
178245

179246
## Cite

examples/cleora_loop.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import time
22

33
import numpy as np
4-
from cleora_python import SparseMatrix
4+
from pycleora import SparseMatrix
55

66
start_time = time.time()
77

examples/column_indices.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import numpy as np
2-
from cleora_python import SparseMatrix
2+
from pycleora import SparseMatrix
33

44
hyperedges = [
55
'a\t1',

examples/from_iterator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import time
22

33
import numpy as np
4-
from cleora_python import SparseMatrix
4+
from pycleora import SparseMatrix
55

66
start_time = time.time()
77

examples/graph_pickle.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import time
22

33
import numpy as np
4-
from cleora_python import SparseMatrix
4+
from pycleora import SparseMatrix
55

66
import pickle
77

examples/predefined_cleora_loop.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import time
22

3-
from cleora_python import embed_using_baseline_cleora, SparseMatrix
3+
from pycleora import embed_using_baseline_cleora, SparseMatrix
44

55
start_time = time.time()
66
graph = SparseMatrix.from_files(["perf_inputs/0.tsv", "perf_inputs/1.tsv", "perf_inputs/2.tsv", "perf_inputs/3.tsv", "perf_inputs/4.tsv", "perf_inputs/5.tsv", "perf_inputs/6.tsv", "perf_inputs/7.tsv"], "complex::reflexive::name")
File renamed without changes.

cleora_python/__init__.py renamed to pycleora/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import numpy as np
22

3-
from .cleora import SparseMatrix
3+
from .pycleora import SparseMatrix
44

55
def embed_using_baseline_cleora(graph, feature_dim: int, iter: int):
66
embeddings = graph.initialize_deterministically(feature_dim)
Binary file not shown.

0 commit comments

Comments
 (0)