Skip to content

Feature request : Enhanced Geospatial and Temporal Search #740

@nicolas-geysse

Description

@nicolas-geysse

Here's a plan to enhance TxtAI with geospatial and temporal search capabilities:

1. Extend indexing for geospatial data:

  • Use GeoPandas for geospatial data handling, as it integrates well with NetworkX.
  • Implement a GeospatialGraph class that extends TxtAI's existing Graph:
import geopandas as gpd
from txtai.graph import Graph

class GeospatialGraph(Graph):
    def __init__(self):
        super().__init__()
        self.gdf = gpd.GeoDataFrame()

    def add_node(self, node_id, geometry, **attr):
        super().add_node(node_id, **attr)
        self.gdf = self.gdf.append({'node_id': node_id, 'geometry': geometry}, ignore_index=True)

    def spatial_query(self, geometry, predicate='intersects'):
        return self.gdf[self.gdf.geometry.geom_method(predicate, geometry)]['node_id'].tolist()

2. Implement temporal search functionalities:

  • Use pandas for temporal data handling, as it's already part of TxtAI's ecosystem.
  • Extend the Graph class to include temporal attributes:
import pandas as pd

class TemporalGraph(Graph):
    def __init__(self):
        super().__init__()
        self.temporal_index = pd.DatetimeIndex([])

    def add_node(self, node_id, timestamp, **attr):
        super().add_node(node_id, **attr)
        self.temporal_index = self.temporal_index.append(pd.DatetimeIndex([timestamp]))

    def temporal_query(self, start_time, end_time):
        mask = (self.temporal_index >= start_time) & (self.temporal_index <= end_time)
        return self.temporal_index[mask].tolist()

3. Integrate with existing semantic search:

  • Create a combined SpatioTemporalSemanticGraph class:
from txtai.embeddings import Embeddings

class SpatioTemporalSemanticGraph(GeospatialGraph, TemporalGraph):
    def __init__(self):
        super().__init__()
        self.embeddings = Embeddings()

    def add_node(self, node_id, geometry, timestamp, text, **attr):
        super().add_node(node_id, geometry, timestamp, **attr)
        self.embeddings.index([(node_id, text, None)])

    def search(self, query, geometry=None, start_time=None, end_time=None, limit=10):
        results = self.embeddings.search(query, limit)
        
        if geometry:
            spatial_results = set(self.spatial_query(geometry))
            results = [r for r in results if r[0] in spatial_results]
        
        if start_time and end_time:
            temporal_results = set(self.temporal_query(start_time, end_time))
            results = [r for r in results if r[0] in temporal_results]
        
        return results

This implementation:

  1. Uses GeoPandas for geospatial indexing, which is compatible with NetworkX.
  2. Utilizes pandas for temporal indexing, which is already part of TxtAI's ecosystem.
  3. Integrates seamlessly with TxtAI's existing semantic search capabilities.
  4. Provides a simple interface for combined spatio-temporal-semantic queries.

To use this enhanced graph:

graph = SpatioTemporalSemanticGraph()
graph.add_node("1", Point(0, 0), pd.Timestamp("2023-01-01"), "Sample text")
results = graph.search("sample", 
                       geometry=Point(0, 0).buffer(1), 
                       start_time=pd.Timestamp("2022-01-01"), 
                       end_time=pd.Timestamp("2024-01-01"))

This approach extends TxtAI's capabilities while maintaining simplicity and integration with its existing ecosystem.

Citations:
[1] https://networkx.org/documentation/stable/auto_examples/geospatial/index.html
[2] https://networkx.org/documentation/stable/auto_examples/geospatial/extended_description.html
[3] geopandas/geopandas#1592
[4] https://napo.github.io/geospatial_course_unitn/lessons/05-street-network-analysis
[5] https://pypi.org/project/networkx-temporal/
[6] https://www.timescale.com/blog/tools-for-working-with-time-series-analysis-in-python/
[7] https://pythongis.org/part1/chapter-03/nb/03-temporal-data.html
[8] https://github.com/MaxBenChrist/awesome_time_series_in_python
[9] https://unit8co.github.io/darts/
[10] https://www.timescale.com/blog/how-to-work-with-time-series-in-python/
[11] https://github.com/sacridini/Awesome-Geospatial
[12] https://www.mdpi.com/1999-4893/10/2/37

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions