A FastAPI application that provides various endpoints for interacting with Azure OpenAI's services, including text completion, conversation, image analysis, and more.
- Basic text completion
- Conversation handling
- Image analysis (base64 and URL)
- Weather function calling
- Streaming responses (SSE and async)
- File search with vector store
- Structured output with JSON schema
- Python 3.9+
- Azure OpenAI API access
- Azure OpenAI model deployment
- Clone the repository:
git clone https://github.com/yourusername/Responses-API.git
cd Responses-API
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
cp .env.sample .env
Edit .env
with your Azure OpenAI credentials:
AZURE_OPENAI_API_MODEL=your-model-deployment-name
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_API_ENDPOINT=https://your-resource-name.openai.azure.com/
AZURE_OPENAI_API_VERSION=2025-03-01-preview
python main.py
The server will start at http://localhost:8000
. Access the interactive API documentation at http://localhost:8000/docs
.
Basic text completion endpoint.
{
"prompt": "Complete this sentence: The quick brown fox"
}
Conversation completion with message history.
{
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}
Image analysis with base64-encoded image.
{
"prompt": "Describe this image",
"image": "base64_encoded_image_data"
}
Image analysis with URL.
{
"prompt": "Describe this image",
"url": "https://example.com/image.jpg"
}
Weather information using function calling.
{
"location": "London",
"unit": "celsius"
}
Basic streaming response.
{
"prompt": "Write a story about"
}
Server-Sent Events streaming.
{
"prompt": "Write a story about"
}
Asynchronous streaming response.
{
"prompt": "Write a story about"
}
Streaming conversation response.
{
"messages": [
{"role": "user", "content": "Tell me a story"}
]
}
Response chaining using previous_response_id.
{
"input": "Explain this at a level that could be understood by a college freshman",
"previous_response_id": "resp_67cbc9705fc08190bbe455c5ba3d6daf"
}
Response:
{
"response_id": "resp_67cbc970fd0881908353a4298996b3f6",
"response": "Here's a simpler explanation..."
}
Manual response chaining using message history.
{
"inputs": [
{
"role": "user",
"content": "Define and explain the concept of catastrophic forgetting?"
},
{
"role": "assistant",
"content": "Catastrophic forgetting refers to..."
},
{
"role": "user",
"content": "Explain this at a level that could be understood by a college freshman"
}
]
}
Response:
{
"response_id": "resp_67cbc970fd0881908353a4298996b3f6",
"response": "Let me explain it simply...",
"full_message_history": [
// Previous messages plus the new response
]
}
Implementation Notes for Chained Responses:
- Use
/chained-response
when you want to:- Keep the context lightweight
- Don't need to modify previous messages
- Have a simple request/response flow
- Use
/manual-chain
when you want to:- Have full control over the message history
- Modify or filter previous messages
- Keep track of the full conversation history
Example Usage:
# First request to get initial response
response1 = requests.post(
"https://api.example.com/chained-response",
json={"input": "Define quantum computing"}
)
first_response = response1.json()
# Second request using previous response ID
response2 = requests.post(
"https://api.example.com/chained-response",
json={
"input": "Explain it more simply",
"previous_response_id": first_response["response_id"]
}
)
Vector store-based file search for smaller files.
{
"query": "What are the company values?",
"file_paths": ["document1.pdf", "document2.pdf"],
"max_results": 20,
"chunk_size": 1048576 // Optional, default 1MB
}
Response:
{
"response": "Based on the documents, the company values include..."
}
Chunked processing for large files with progress tracking.
{
"query": "What are the company policies?",
"file_paths": ["large_handbook.pdf"],
"max_results": 5,
"chunk_size": 524288, // Optional, default 1MB (1024 * 1024)
"batch_size": 2 // Optional, default 5 chunks per batch
}
Response:
{
"search_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"response": "Summary of company policies found in the document..."
}
Track progress of large file processing.
Response:
{
"search_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "processing",
"progress_percentage": 45.5,
"processed_chunks": 5,
"total_chunks": 11
}
Implementation Notes:
- Status values: "initializing", "processing", "completed", "failed"
- Progress tracking is maintained server-side
- Files are processed in chunks to manage memory
- Results are automatically summarized
Structured output with JSON schema validation.
{
"input": "Extract event: Meeting with John on Monday at 2 PM",
"json_schema": {
"type": "object",
"properties": {
"event": {"type": "string"},
"person": {"type": "string"},
"day": {"type": "string"},
"time": {"type": "string"}
}
}
}
All endpoints include proper error handling and will return appropriate HTTP status codes:
- 200: Successful response
- 400: Bad request (invalid input)
- 404: Resource not found (invalid search_id)
- 500: Server error (Azure OpenAI API issues)
- Use
/filesearch
for files < 1MB - Use
/large-filesearch
for files > 1MB - Monitor progress using the
/large-filesearch/{search_id}/progress
endpoint - Consider batch_size based on your server's capabilities:
- Lower batch_size (1-2): Less memory usage, slower processing
- Higher batch_size (5-10): More memory usage, faster processing
-
File Size Handling:
# Check file size before processing file_size = os.path.getsize(file_path) if file_size > 1024 * 1024: # 1MB use_large_filesearch = True
-
Progress Monitoring:
# JavaScript example async function monitorProgress(searchId) { while (true) { const response = await fetch(`/large-filesearch/${searchId}/progress`); const progress = await response.json(); if (progress.status === 'completed' || progress.status === 'failed') { break; } console.log(`Progress: ${progress.progress_percentage}%`); await new Promise(resolve => setTimeout(resolve, 1000)); } }
-
Error Handling:
try: response = await fetch('/large-filesearch', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ query: "search query", file_paths: ["large_file.pdf"], chunk_size: 524288, batch_size: 2 }) }); if (!response.ok) { const error = await response.json(); console.error(`Error: ${error.detail}`); } } catch (error) { console.error('Network error:', error); }
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.