Entity Resolution Methods: Learned representations with a classic neural network, and another combining Meta-blocking with a clustering method
This repo includes a set of experiments with different architectures to explore their capability in entity resolution problems to solve deduplication and map diverse records to their corresponding entities.
The datasets for these experiments are models of piano brands.
The project is broken down by directory, with each directory representing a different approach:
-
block_klsh: An entity resolution experiment based on Meta-Blocking and KLSH: The proposed approach has three sequential stages.
- Firstly the generation of a hierarchical graph for records in blocks and according to blocking rules.
- The resulting record pair relationships enable the construction of a record relationship grapth with components.
- Finally we cluster the records for each component using a K-Means algorithm, an approach we refer herein and based on existing literature, as KLSH.
- The methodology and its implementation is described in more detail in this publication "Entity Resolution: Meta-Blocking and KLSH".
- the code for the project is here
-
siameselike_encoder: An entity resolution experiment based on an Encoder and Clustering algorithm.
- We create and train an encoder/network with transformed entity tabular feature data.
- A feed forward network trained on tabular data learns representations from record features that serve as inputs for a clustering algorithm to separate entities in the representation space.
- The methodology and its implementation is described in more detail in this publication "Entity Resolution: Learned Representations of Tabular Data with Classic Neural Networks".
- the code for the project is here
This project is licensed under the MIT License. See LICENSE.txt for more information.
For questions or collaborations please reach out to sergiosolorzano@gmail.com
If you find this helpful you can buy me a coffee :)