Skip to content

add missing q param for item search and collection-items search #267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

fmigneault
Copy link

@fmigneault fmigneault commented Jul 16, 2025

Related Issue(s):

Description:

Whether the q basic/advanced extension get moved/split/else, they currently define q parameter on other endpoints that are not handled by the API (only /collections?q=... was handled.

This adds the q parameter to /search and /collections/{collection_id}/items as well.

PR Checklist:

  • pre-commit hooks pass locally
  • Tests pass (run make test)
  • Documentation has been updated to reflect changes, if applicable, and docs build successfully (run make docs)
  • Changes are added to the CHANGELOG.

@vincentsarago
Copy link
Member

Thanks @fmigneault

I already started #263 but wanted to add tests before merging

@hrodmn
Copy link
Collaborator

hrodmn commented Jul 17, 2025

Hi @fmigneault - thanks for opening this PR. We intentionally left q out of the item search endpoints because pgstac only runs the search using the q parameter on three collection-specific fields (title, description, keywords) [ref].

The free-text search extension is potentially very powerful but the current implementation is somewhat limited (see below). Perhaps cql2-json filters could help you achieve your goals?

The q parameter is technically available in the pgstac-side item search code, but unless you have the title, description, or keywords fields defined in your item properties, it would not be helpful without some changes in pgstac.

The STAC API Free-Text Extension states:

This defines a new parameter, q that allows the user to perform free-text queries against STAC metadata. The value of the parameter is a string and is passed to the underlying backend for free-text searches. The specific set of text fields to which the parameter is applied is left to the discretion of the implementation, but a recommendation is to at least consider:

  • Collections: title, description and keywords
  • Catalog: title, description
  • Item: all relevant textual properties It is also allowed to query against all text fields.

We did not include any other fields in the pgstac implementation because there aren't any canonical text fields in the item spec and pgstac just stuffs all fields except assets, links, stac_extensions, stac_version, collection, datetime, bbox, and geometry into a JSONB column called properties. Without knowing the names of specific text fields it would not be advisable to perform a search using the tsquery approach. Furthermore, the tsquer indexes get built on-the-fly which is fine for the collection search context because there are not too many rows, but for item searches it could be very slow!

I am really curious to understand the use-case for free-text search in items - can you share a bit more about the types of searches you are trying to enable?

@fmigneault
Copy link
Author

fmigneault commented Jul 17, 2025

@hrodmn
Thanks for the information. I am aware of the CQL2 approaches and the free-text search limitations. I am trying to address user needs that are not programmers and just want minimal searches using natural language. We are planing to provide minimal descriptions in our datasets to help in this effort.

Potentially, text search could be done over all item properties as a global string, but I am aware it is not the case for now. I would not expect property-specific search using q. If that is required by a user, I would guide them toward other more advanced search capabilities like using query, filter and CQL2.

@hrodmn
Copy link
Collaborator

hrodmn commented Jul 17, 2025

Sounds good, thanks for the additional context @fmigneault.

We are planing to provide minimal descriptions in our datasets to help in this effort.

If your datasets are represented as STAC collections then adding a detailed description field and some keywords will make them easily searchable using /collecions?q={query}. What are the datasets that you are considering adding as STAC items?

Potentially, text search could be done over all item properties as a global string, but I am aware it is not the case for now.

Yeah, we could just represent the whole properties field as a string and perform the tsquery comparison against the entire thing. I haven't tried this but I fear the operation would be quite costly to the pgstac database.

@fmigneault
Copy link
Author

We have a mixture of climate datasets that have >300K items in the same collection and some earth observation with machine learning annotations in the order of ~30K items/samples split into 3 collections (train, test, validate). Therefore, we need some level of granularity/flexibility to filter, but we can add common keywords like the class category, ML-specific details (eg: "train") or other similar common keywords to provide very rough search.

Sometimes, we have use cases like users wanting to find "temperature" or "precipitation" information, but as non-experts just looking around, they are not aware that the actual variable is temperature_2m. The approximate "temperature" search allows them to find relatively good results, though we are not expecting them to be the "best match".

In the longer run, we are planing to integrate augmented search (eg: using LLM and whatnot), but the q parameter would be pre-translated to match the format of relevant search parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Free-text search parameter 'q' only works for collection search.
3 participants