Regarding "Tutorial: How to Use Pipelines" #5505

dkbs12 · 2023-08-03T12:43:22Z

dkbs12
Aug 3, 2023

Hello,
I'm studying the tutorial "Tutorial: How to Use Pipelines" and I have some questions about it.

First, I can find the name of file in the result of the below coding with the documents provided by Haystack, "wiki_gameofthrones_txt11.zip"

< Coding >
from haystack.pipelines import DocumentSearchPipeline
from haystack.utils import print_documents

p_retrieval = DocumentSearchPipeline(embedding_retriever)
res = p_retrieval.run(query="Who is the father of Arya Stark?", params={"Retriever": {"top_k": 10}})
print_documents(res, max_text_len=200)

< Result >
Query: Who is the father of Arya Stark?

{ 'content': '\n'
'=== Background ===\n'
'Arya is the third child and younger daughter of Eddard and '
'Catelyn Stark and is nine years old at the beginning of the '
'book series. She has five siblings: an older brother Robb, '
'a...',
'name': '43_Arya_Stark.txt'}

{ 'content': '\n'
'===Arya Stark===\n'
"'''Arya Stark''' portrayed by Maisie Williams. Arya Stark of "
'House Stark is the younger daughter and third child of Lord '
'Eddard and Catelyn Stark of Winterfell. Ever the tomboy, '
'Arya...',
'name': '349_List_of_Game_of_Thrones_characters.txt'}

But, I can't find the name of file in the result with the documents of my own, "Phase1_test_data.zip"
(every name is presented as "None")
I provide you the document file and the coding and result as below;

< Coding >
from haystack.utils import fetch_archive_from_http

doc_dir = "data/Phase1_test01"
url = "https://github.com/dkbs12/External_test/raw/main/Phase1_test_data.zip"
fetch_archive_from_http(url=url, output_dir=doc_dir)
.
. (the other coding is all same as original one)
.
from haystack.pipelines import DocumentSearchPipeline
from haystack.utils import print_documents

p_retrieval = DocumentSearchPipeline(embedding_retriever)
res = p_retrieval.run(query="What is NDC?", params={"Retriever": {"top_k": 10}})
print_documents(res, max_text_len=200)

< Result >
Query: What is NDC?

{ 'content': 'New Distribution Capability (NDC) in Air Travel: Airlines, '
'GDSs, and Impact on the Industry\n'
'\n'
'Two fundamental needs connect all airlines: revenue and '
'passenger satisfaction. To satisfy customers, carri...',
'name': None}

{ 'content': 'To understand the real value of NDC and why it has emerged at '
'all, we need to look back in time.\n'
'\n'
'History of air distribution\n'
'The distribution system in the air travel industry includes '
'many players i...',
'name': None}

How can I get the file name in the result of above coding?

anakin87 · 2023-08-03T15:17:03Z

anakin87
Aug 3, 2023
Maintainer

Hello, @dkbs12!

I've prepared a Colab notebook for you, with a minimal version of the tutorial, working on your files.

Since your files are of different formats, I modified the installation command to reflect this.

I get these results:

Query: What is NDC?

{ 'content': 'New Distribution Capability (NDC) in Air Travel: Airlines, '
'GDSs, and Impact on the Industry\n'
'Two fundamental needs connect all airlines: revenue and '
'passenger satisfaction. To satisfy customers, carrie...',
'name': 'New_Distribution_Capability_in_Air_Travel.txt'}

{ 'content': 'Transitioning to a Future of Intelligent Dynamic Offers\n'
'A Datalex white paper on the important developments in '
'Continuous Pricing and Dynamic Offer Generation for a future '
'of rich airline retailing. F...',
'name': 'Transitioning_to_a_Future_of_Intelligent_Dynamic_Offers.docx'}

{ 'content': 'Dynamic pricing of airline offers\n'
'Received: 22 October 2017 / Accepted: 9 March 2018 / Published '
'online: 12 April 2018\n'
'Ó Macmillan Publishers Ltd., part of Springer Nature 2018\n'
'Abstract Airlines have ...',
'name': 'Dynamic_pricing_of_airline_offers.pdf'}

To better understand how to cope with different types of files, I suggest you have a look at the Preprocessing tutorial and also suggest to split your documents using the Preprocessor, as explained in the linked tutorial....

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regarding "Tutorial: How to Use Pipelines" #5505

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Regarding "Tutorial: How to Use Pipelines" #5505

Uh oh!

Uh oh!

dkbs12 Aug 3, 2023

How can I get the file name in the result of above coding?

Replies: 2 comments

Uh oh!

anakin87 Aug 3, 2023 Maintainer

dkbs12
Aug 3, 2023

anakin87
Aug 3, 2023
Maintainer