spaCy Model for Column auto mapping #5434
vagarwal77
started this conversation in
Help: Best practices
Replies: 1 comment
-
From your question, it is also not entirely clear to me which NLP challenge you're trying to tackle. Automatically mapping field names in databases feels like a "dangerous" thing to do only with NLP. In your example, a connection is made between database fields that have the same name (and presumably the same type) but that doesn't seem to require NLP. Either way I think you'll need to implement a custom algorithm suited for your use-case. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am working on a healthcare project where we have different dataset schemas from different providers. Each datasets have app. 100 tables along with each table have app. 20 columns so in total abut 100 * 20 = 2000 entities.
We need to map these 2000 entities to a standard Common Data Model
Manually performing these mapping tasks are very tedious and error prone hence, I would like to use the semantics (Column name, column datatype. Length or column description) to map the columns from both data sources automatically.
I am trying to develop a semantic model to auto map the columns between 2 tables based upon column name.
I have sample mapping data for training purpose like below -
Algorithm I am looking is -
Find the column to column mappings based upon train datasets as below -
df = pd.read_csv(mapping, sep=',',
usecols=['src_column', 'src_table', ’src_column_length', 'src_column_type','src_column_desc', 'dest_column',
'dest_table', ’dest_column_length', 'dest_column_type','dest_column_desc'], encoding='utf8')
I can also feed the 1000 records from the table to provide more inside with the column contents
Now, if I provide 2 distinct tables with columns, I would like model to provide me the mapping between columns form both the tables.
Please suggest which model we need to load -
should I use nlp = spacy.load("en")
Any code sample to achieve above would be great to achieve the goal. I am using Python3
Beta Was this translation helpful? Give feedback.
All reactions