-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Description
Search functions now return funny values for the content
field.
def search(cursor, query_embedding):
"""Search matching documents with a given query and return a dict."""
data = {
'text': ' '.join(sys.argv[1:]),
'query_embedding': query_embedding,
'match_threshold': 0.5,
'match_count': 10,
'weights': json.dumps(FINESSE_JSON_PARSED_WEIGHTS)
}
cursor.execute("""
SELECT *
FROM search(%(text)s, %(query_embedding)s::vector, %(match_threshold)s,
%(match_count)s, %(weights)s::JSONB)
""", data)
# turn into list of dict now to preserve dictionaries
results = cursor.fetchall()
from pprint import pprint
pprint(results)
return [dict(r) for r in results[0]["search"]]
Printed value for "who's the president?"
[
{
"search": [
{
"content": "2004",
"id": "1bc6def7-339e-49cf-996c-8559c8f074b0",
"last_updated": "2022-08-02",
"query_id": "b685ff7a-7bc3-40c7-88de-8f6183a13f2a",
"score": 0.43899683490982105,
"scores": {
"current": 1,
"recency": 0.8294036718602102,
"similarity": 0.778211130010486,
"traffic": 0.08711416212970638,
"typicality": 0.0012760527435133986,
},
"subtitle": "2004",
"title": "L'ACIA : beaucoup de chemin parcouru depuis 1997 - "
"Agence canadienne d'inspection des aliments",
"url": "https://inspection.canada.ca/inspecter-et-proteger/salubrite-des-aliments/chemin-parcouru/fra/1655235733366/1655496070310",
},
{
"content": "2004",
"id": "9a8ba37c-8b15-4b3f-9f5d-49ca50f92d35",
"last_updated": "2022-08-02",
"query_id": "b685ff7a-7bc3-40c7-88de-8f6183a13f2a",
"score": 0.43874722259501314,
"scores": {
"current": 1,
"recency": 0.8294036718602102,
"similarity": 0.778211130010486,
"traffic": 0.08586610055566671,
"typicality": 0.0012760527435133986,
},
"subtitle": "2004",
"title": "L'ACIA : beaucoup de chemin parcouru depuis 1997 - "
"Agence canadienne d'inspection des aliments",
"url": "https://inspection.canada.ca/inspecter-et-proteger/sante-des-vegetaux/chemin-parcouru/fra/1655235733366/1655496024138",
}
]
}
]
- The chunk's full text or snippet is expected to be received in the
content
field. - The subtitle values also look funny, although I'm not sure what is supposed to be received there.
- I suggest we receive the value (list of results) of the
search
field instead of[ { "search": [...] } ]
For comparison we receive this from azure-db:
[
{
"content": "Kochhar\n \n \n CFIA <strong>President</strong>\n Dr.",
"id": "ZGI2M2VmNjktNTVmMC00ODQ2LThlZWItZDljYTYwZDMwNTI10",
"last_updated": "2023-04-18T00:00:00Z",
"score": 8.465198,
"title": "Dr. Harpreet S. Kochhar - Canadian Food Inspection Agency",
"url": "https://inspection.canada.ca/about-cfia/organizational-structure/cfia-president/eng/1681496883837/1681496884212",
},
{
"content": "<strong>The</strong> CFIA is headed by a "
"<strong>President</strong>, who has <strong>the</strong> rank "
"and all <strong>the</strong> powers of a Deputy Head of a "
"Department.",
"id": "ZDg1YzNlMjEtMzhjMS00NTQ2LWFhYTEtM2ZjOWUyOTZmZWFm0",
"last_updated": "2017-08-28T00:00:00Z",
"score": 8.322858,
"title": "Canadian Food Inspection Agency (CFIA) - Quarterly Financial "
"Report (QFR) for the Quarter ended June 30, 2017 - Canadian Food "
"Inspection Agency",
"url": "https://inspection.canada.ca/about-cfia/transparency/corporate-management-reporting/reports-to-parliament/financial-reporting/quarter-ended-june-30-2017/eng/1502989987656/1502989988316",
},
]
Acceptance criteria
- Full text or relevant snippet is received in the
content
field
rngadam