-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Description
crawl4ai version
0.7.6
Expected Behavior
Adaptive crawler. strategy="embedding",
embedding_model="openai/text-embedding-3-small", embedding_llm_config=LLMConfig( provider="openai/gpt-4o-mini", // openai/text-embedding-3-small fails here too api_token=OPENAI_API_KEY, ),
Query:
'capabilities, services,
certifications, description of work,
products'
Query Expansion:
Original query expanded to 12 variations
- Where can I compare prices for various products?
- What new products have been launched this year?
- What products are recommended for pet owners?
- What are the must-have products for outdoor activities?
Current Behavior
• Query variations are replaced by a hard-coded “fried rice” list.
• embedding_llm_config is reused for both generation and embeddings, so the wrong provider/model can hit the wrong API:
• Chat model sent to embeddings endpoint → 403.
• Embedding model used as a “provider” for text generation → failures or zero variations.
• Embedding dimension sometimes mismatches the configured embedding_model.
Is this reproducible?
Yes
Inputs Causing the Bug
line 700
map_query_semantic_space function uses left over mock data
doesn't use the correct model for expansion or embeddingSteps to Reproduce
A) hard coded query variations:
1. Use strategy="embedding" and call AdaptiveCrawler.digest(...).
2. Observe variations list: always food-related (“fried rice…”) regardless of query.
B) 403 when embeddings are requested
`AdaptiveConfig(
strategy="embedding",
embedding_model="openai/text-embedding-3-small",
embedding_llm_config=LLMConfig(
provider="openai/gpt-4o-mini",
api_token=OPENAI_API_KEY,
),
n_query_variations=12,
)`
2. Run digest(...).
3. Intermittently see: `403 - You are not allowed to generate embeddings from this model` or end up with variations: 0, and embedding dims/behavior inconsistent with the configured model.Code snippets
adaptive = AdaptiveCrawler(crawler, adaptive_cfg)
result = await adaptive.digest(start_url=start_url, query=query)
Or literally just run the adaptive crawler example available in the Craw4ai repository.OS
macOS
Python version
3.13.5
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
litellm.exceptions.BadRequestError: litellm.BadRequestError: OpenAIException - Error code: 403 - {'error': {'message': 'You are not allowed to generate embeddings from this model', 'type': 'invalid_request_error', 'param': None, 'code': None}}
and
Adaptive Crawl Stats - Query:
'capabilities, services,
certifications, description of work,
products'
Query Expansion:
Original query expanded to 4 variations
- how to add flavor to vegetable fried rice?
- what are the best vegetables to use in fried rice?
- are there any tips for making healthy fried rice with vegetables?
- how do I make vegetable fried rice from scratch?
...
`