Replies: 2 comments
-
Hello, @dkbs12! I've prepared a Colab notebook for you, with a minimal version of the tutorial, working on your files. Since your files are of different formats, I modified the installation command to reflect this. I get these results:
To better understand how to cope with different types of files, I suggest you have a look at the Preprocessing tutorial and also suggest to split your documents using the |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I'm studying the tutorial "Tutorial: How to Use Pipelines" and I have some questions about it.
First, I can find the name of file in the result of the below coding with the documents provided by Haystack, "wiki_gameofthrones_txt11.zip"
< Coding >
from haystack.pipelines import DocumentSearchPipeline
from haystack.utils import print_documents
p_retrieval = DocumentSearchPipeline(embedding_retriever)
res = p_retrieval.run(query="Who is the father of Arya Stark?", params={"Retriever": {"top_k": 10}})
print_documents(res, max_text_len=200)
< Result >
Query: Who is the father of Arya Stark?
{ 'content': '\n'
'=== Background ===\n'
'Arya is the third child and younger daughter of Eddard and '
'Catelyn Stark and is nine years old at the beginning of the '
'book series. She has five siblings: an older brother Robb, '
'a...',
'name': '43_Arya_Stark.txt'}
{ 'content': '\n'
'===Arya Stark===\n'
"'''Arya Stark''' portrayed by Maisie Williams. Arya Stark of "
'House Stark is the younger daughter and third child of Lord '
'Eddard and Catelyn Stark of Winterfell. Ever the tomboy, '
'Arya...',
'name': '349_List_of_Game_of_Thrones_characters.txt'}
But, I can't find the name of file in the result with the documents of my own, "Phase1_test_data.zip"
(every name is presented as "None")
I provide you the document file and the coding and result as below;
< Coding >
from haystack.utils import fetch_archive_from_http
doc_dir = "data/Phase1_test01"
url = "https://github.com/dkbs12/External_test/raw/main/Phase1_test_data.zip"
fetch_archive_from_http(url=url, output_dir=doc_dir)
.
. (the other coding is all same as original one)
.
from haystack.pipelines import DocumentSearchPipeline
from haystack.utils import print_documents
p_retrieval = DocumentSearchPipeline(embedding_retriever)
res = p_retrieval.run(query="What is NDC?", params={"Retriever": {"top_k": 10}})
print_documents(res, max_text_len=200)
< Result >
Query: What is NDC?
{ 'content': 'New Distribution Capability (NDC) in Air Travel: Airlines, '
'GDSs, and Impact on the Industry\n'
'\n'
'Two fundamental needs connect all airlines: revenue and '
'passenger satisfaction. To satisfy customers, carri...',
'name': None}
{ 'content': 'To understand the real value of NDC and why it has emerged at '
'all, we need to look back in time.\n'
'\n'
'History of air distribution\n'
'The distribution system in the air travel industry includes '
'many players i...',
'name': None}
How can I get the file name in the result of above coding?
Beta Was this translation helpful? Give feedback.
All reactions