Skip to content

[Bug] How do i use my already built qdrant vector db with {'book_name', 'page', 'chunk', 'upload_date'} keys without adding them via agno throught he content db #4646

@JoeMukherjeeAssessli

Description

@JoeMukherjeeAssessli

Description

I need to re-use my qdrant vector db, i can not recreate the full db again it is tens of thousands of pages pdfs long

Steps to Reproduce

@app.post("/v1/chat", response_model=ChatResponse)
async def chat(
    user_id: str = Form(...),
    session_id: str = Form(...),
    message: str = Form(...),
    files: List[UploadFile] = File(None),
    book_names: Optional[str] = Form(None)  # Comma-separated list of book names for knowledge filtering
):
    """
    Unified chat endpoint supporting text messages, single file uploads, and multiple file uploads
    
    Args:
        user_id: User identifier
        session_id: Session identifier
        message: User's message/query
        files: Optional list of files to upload
        book_names: Optional comma-separated list of book names to filter knowledge search
                   Example: "CALLEN_S ULTRASOUND OBG 6th edition_Part14.pdf,Harrison's Internal Medicine 20th edition_Part5.pdf"
    """
    
    if not PIPELINE_AVAILABLE:
        raise HTTPException(status_code=503, detail="Pipeline not available")
    
    try:
        # Process book_names parameter for knowledge filtering
        knowledge_filters = None
        if book_names and book_names.strip():
            book_list = [book.strip() for book in book_names.split(',') if book.strip()]
            if book_list:
                knowledge_filters = {"book_name": book_list}
                logger.info(f"🔍 Knowledge filters applied: {knowledge_filters}")
        
        # Set allowed file types
        allowed_types = [
            'image/jpeg', 'image/jpg', 'image/png', 'image/gif', 'image/webp',
            'application/pdf', 'text/plain', 'application/msword',
            'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
        ]
        
        # Handle case when no files are provided
        if not files or len(files) == 0:
            # Process text-only message
            agent = get_medical_agent()
            result = agent.process_message(
                message=message,
                user_id=user_id,
                session_id=session_id,
                knowledge_filters=knowledge_filters
            )
        elif len(files) == 1:
            # Single file case - use the original process_message_concurrent for backward compatibility
            file = files[0]
            
            # Validate file type
            if file.content_type not in allowed_types:
                raise HTTPException(
                    status_code=400,
                    detail=f"Unsupported file type: {file.content_type}. Supported types: {', '.join(allowed_types)}"
                )
            
            # Read file content
            file_content = await file.read()
            
            # Check file size (10MB limit)
            max_size = 10 * 1024 * 1024  # 10MB
            if len(file_content) > max_size:
                raise HTTPException(
                    status_code=400,
                    detail=f"File too large. Maximum size is {max_size / (1024*1024):.1f}MB"
                )

Agent Configuration (if applicable)

No response

Expected Behavior

Model should use the knowledge base with the filters given

Actual Behavior

2025-09-17 13:33:07,081 - INFO - AFC remote call 1 is done.
WARNING Invalid filter keys provided: ['book_name']. These filters will be ignored.
INFO Valid filter keys are: set()
WARNING No valid filters remain after validation. Search will proceed without filters.
2025-09-17 13:33:11,227 - INFO - HTTP Request: POST https://3a7e5639-7584-4d45-9711-25e3deddc0b3.us-east4-0.gcp.cloud.qdrant.io:6333/collections/medical_knowledge_base/points/query "HTTP/1.1 200 OK"
INFO Found 10 documents
2025-09-17 13:33:11,518 - INFO - AFC is enabled with max remote calls: 10.
2025-09-17 13:33:26,230 - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent "HTTP/1.1 200 OK"
2025-09-17 13:33:26,240 - WARNING - Warning: there are non-text parts in the response: ['thought_signature'], returning concatenated parsed result from text parts. Check the full candidates.content.parts accessor to get the full model response.
2025-09-17 13:33:26,241 - INFO - AFC remote call 1 is done.
2025-09-17 13:33:27,190 - INFO - ✅ Successfully parsed JSON response with message: Ultrasonography, commonly known
as ultrasound, is a non-invasive imaging techniq...
2025-09-17 13:33:27,190 - INFO - ✅ Successfully parsed JSON response with message: Ultrasonography, commonly known
as ultrasound, is a non-invasive imaging technique that uses high-fr...
INFO: 127.0.0.1:52354 - "POST /v1/chat HTTP/1.1" 200 OK
2025-09-17 13:33:27,192 - INFO - 🔍 Knowledge filters applied: {'book_name': ["Harrison's Internal Medicine 20th edition_Part5.pdf", 'CALLEN_S ULTRASOUND OBG 6th edition_Part14.pdf']}
2025-09-17 13:33:29,424 - INFO - AFC is enabled with max remote calls: 10.

Screenshots or Logs (if applicable)

No response

Environment

windows

Possible Solutions (optional)

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions