kpop-idol-vectors

Generate word embeddings for youtube comments on Kpop idols.

[1] Overview

Word embedding projection: embeddings trained on youtube comments on videos about Korean idols.

[2] Vector similarity

[3] Data collection

Used Youtube API to collect comments on videos about each artist.

[4] Model

Used FastText pretrained Korean embedding and built vocabs from youtube_comments dataset.

[5] Visualization

Generating tensor.tsv and metadata.tsv for Embedding projection as an attempt to see neighbors of query.

query='장원영'

query='카리나'

Generating all vocab and tensors from FastText's pretrained model will lead to creating files over 10G.
Therefore, after training on youtube_comment dataset from pretrained model, I extracted only vectors of tokens in my dataset from pretrained model.

[6] Limitation

query='하니'

'하니' is a member of girl group New Jeans.
However, '하니' is one of verb conjugations of verb '하다 (do)' at the same time, and it returned a dissapointing result this time as seen above.
Pretrained model (, which built new vocabulary from comments dataset) may have more general sense, but failed to catch meaningful similar vectors of '하니'.
Since the purpose and dataset is very focused on a specific theme, it may be better to train word embedding from scratch, not by loading from pretrained embedding from FastText.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
img		img
README.md		README.md
crawl_comments.py		crawl_comments.py
idol_list.txt		idol_list.txt
kpop-idols-w2v.ipynb		kpop-idols-w2v.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

kpop-idol-vectors

[1] Overview

[2] Vector similarity

[3] Data collection

[4] Model

[5] Visualization

[6] Limitation

About

Uh oh!

Releases

Packages

Languages

kaiyoo/kpop-idols-w2v

Folders and files

Latest commit

History

Repository files navigation

kpop-idol-vectors

[1] Overview

[2] Vector similarity

[3] Data collection

[4] Model

[5] Visualization

[6] Limitation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages