Skip to content

Search functions return wrong content in ailab-db #91

@k-allagbe

Description

@k-allagbe

Description

Search functions now return funny values for the content field.

def search(cursor, query_embedding):
    """Search matching documents with a given query and return a dict."""
    data = {
        'text': ' '.join(sys.argv[1:]),
        'query_embedding': query_embedding,
        'match_threshold': 0.5,
        'match_count': 10,
        'weights': json.dumps(FINESSE_JSON_PARSED_WEIGHTS)
    }

    cursor.execute("""
        SELECT *
        FROM search(%(text)s, %(query_embedding)s::vector, %(match_threshold)s,
                   %(match_count)s, %(weights)s::JSONB)
    """, data)
    # turn into list of dict now to preserve dictionaries
    results = cursor.fetchall()
    from pprint import pprint
    pprint(results)
    return [dict(r) for r in results[0]["search"]]

Printed value for "who's the president?"

[
    {
        "search": [
            {
                "content": "2004",
                "id": "1bc6def7-339e-49cf-996c-8559c8f074b0",
                "last_updated": "2022-08-02",
                "query_id": "b685ff7a-7bc3-40c7-88de-8f6183a13f2a",
                "score": 0.43899683490982105,
                "scores": {
                    "current": 1,
                    "recency": 0.8294036718602102,
                    "similarity": 0.778211130010486,
                    "traffic": 0.08711416212970638,
                    "typicality": 0.0012760527435133986,
                },
                "subtitle": "2004",
                "title": "L'ACIA : beaucoup de chemin parcouru depuis 1997 - "
                "Agence canadienne d'inspection des aliments",
                "url": "https://inspection.canada.ca/inspecter-et-proteger/salubrite-des-aliments/chemin-parcouru/fra/1655235733366/1655496070310",
            },
            {
                "content": "2004",
                "id": "9a8ba37c-8b15-4b3f-9f5d-49ca50f92d35",
                "last_updated": "2022-08-02",
                "query_id": "b685ff7a-7bc3-40c7-88de-8f6183a13f2a",
                "score": 0.43874722259501314,
                "scores": {
                    "current": 1,
                    "recency": 0.8294036718602102,
                    "similarity": 0.778211130010486,
                    "traffic": 0.08586610055566671,
                    "typicality": 0.0012760527435133986,
                },
                "subtitle": "2004",
                "title": "L'ACIA : beaucoup de chemin parcouru depuis 1997 - "
                "Agence canadienne d'inspection des aliments",
                "url": "https://inspection.canada.ca/inspecter-et-proteger/sante-des-vegetaux/chemin-parcouru/fra/1655235733366/1655496024138",
            }
        ]
    }
]
  1. The chunk's full text or snippet is expected to be received in the content field.
  2. The subtitle values also look funny, although I'm not sure what is supposed to be received there.
  3. I suggest we receive the value (list of results) of the search field instead of [ { "search": [...] } ]

For comparison we receive this from azure-db:

[
    {
        "content": "Kochhar\n   \n \n  CFIA <strong>President</strong>\n Dr.",
        "id": "ZGI2M2VmNjktNTVmMC00ODQ2LThlZWItZDljYTYwZDMwNTI10",
        "last_updated": "2023-04-18T00:00:00Z",
        "score": 8.465198,
        "title": "Dr. Harpreet S. Kochhar - Canadian Food Inspection Agency",
        "url": "https://inspection.canada.ca/about-cfia/organizational-structure/cfia-president/eng/1681496883837/1681496884212",
    },
    {
        "content": "<strong>The</strong> CFIA is headed by a "
        "<strong>President</strong>, who has <strong>the</strong> rank "
        "and all <strong>the</strong> powers of a Deputy Head of a "
        "Department.",
        "id": "ZDg1YzNlMjEtMzhjMS00NTQ2LWFhYTEtM2ZjOWUyOTZmZWFm0",
        "last_updated": "2017-08-28T00:00:00Z",
        "score": 8.322858,
        "title": "Canadian Food Inspection Agency (CFIA) - Quarterly Financial "
        "Report (QFR) for the Quarter ended June 30, 2017 - Canadian Food "
        "Inspection Agency",
        "url": "https://inspection.canada.ca/about-cfia/transparency/corporate-management-reporting/reports-to-parliament/financial-reporting/quarter-ended-june-30-2017/eng/1502989987656/1502989988316",
    },
]

Acceptance criteria

  • Full text or relevant snippet is received in the content field

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Todo

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions