Skip to content

Entity Resolution projects with Tabular Data: One combines learned representations generated by a siamese-like feed-forward neural network and a clustering algorithm; another combines meta-blocking with a clustering method.

License

Notifications You must be signed in to change notification settings

sergiosolorzano/entity_resolution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Entity Resolution Methods: Learned representations with a classic neural network, and another combining Meta-blocking with a clustering method

DeepWiki

This repo includes a set of experiments with different architectures to explore their capability in entity resolution problems to solve deduplication and map diverse records to their corresponding entities.

The datasets for these experiments are models of piano brands.

The project is broken down by directory, with each directory representing a different approach:

  1. block_klsh: An entity resolution experiment based on Meta-Blocking and KLSH: The proposed approach has three sequential stages.

    • Firstly the generation of a hierarchical graph for records in blocks and according to blocking rules.
    • The resulting record pair relationships enable the construction of a record relationship grapth with components.
    • Finally we cluster the records for each component using a K-Means algorithm, an approach we refer herein and based on existing literature, as KLSH.
    • The methodology and its implementation is described in more detail in this publication "Entity Resolution: Meta-Blocking and KLSH".
    • the code for the project is here
  2. siameselike_encoder: An entity resolution experiment based on an Encoder and Clustering algorithm.

    • We create and train an encoder/network with transformed entity tabular feature data.
    • A feed forward network trained on tabular data learns representations from record features that serve as inputs for a clustering algorithm to separate entities in the representation space.
    • The methodology and its implementation is described in more detail in this publication "Entity Resolution: Learned Representations of Tabular Data with Classic Neural Networks".
    • the code for the project is here

 

License

This project is licensed under the MIT License. See LICENSE.txt for more information.

 

Contact

For questions or collaborations please reach out to sergiosolorzano@gmail.com

 

If you find this helpful you can buy me a coffee :)

Buy Me A Coffee

About

Entity Resolution projects with Tabular Data: One combines learned representations generated by a siamese-like feed-forward neural network and a clustering algorithm; another combines meta-blocking with a clustering method.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages