Replies: 1 comment 9 replies
-
Hi @ronit450 |
Beta Was this translation helpful? Give feedback.
9 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello Everyone,
I am working on a project where I need Sindhi Sentence level Embedding. For this I am using the Word2vec available pretrained model as described in the sample code. The code is only presented for the Word level embedding whereas I want it for entire Sentence and there can be any strategy, like Average or anything. However I am facing issues in my pipeline
documentAssembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
tokenizer = Tokenizer()
.setInputCols(["document"])
.setOutputCol("token")
Use WordEmbeddings instead of WordEmbeddingsModel
word_embeddings = WordEmbeddingsModel.pretrained("w2v_cc_300d","sd")
.setInputCols(["document", "token"])
.setOutputCol("embeddings")
Use SentenceEmbeddings for obtaining sentence embeddings
sentence_embeddings = SentenceEmbeddings()
.setInputCols(["document", "word_embeddings"])
.setOutputCol("sentence_embeddings")
.setPoolingStrategy("AVERAGE")
pipeline = Pipeline(stages=[documentAssembler, tokenizer, word_embeddings, sentence_embeddings])
data = spark.createDataFrame([["مون کي اسپارڪ اين ايل پي سان پيار آهي"]]).toDF("text")
result = pipeline.fit(data).transform(data)
Extract the final embeddings
sentence_embeddings = result.select("sentence_embeddings.result").first()[0]
print(sentence_embeddings)
The error is :

Beta Was this translation helpful? Give feedback.
All reactions