Releases: cubist38/mlx-openai-server
Releases · cubist38/mlx-openai-server
v1.0.11
v1.0.10
v1.0.9
Summary of Changes
- Implemented OpenAI-Standard Streaming with Tool Calls and Thinking Parser: Fully integrated streaming support for tool calls, including a custom thinking parser, in compliance with OpenAI’s latest API standards. Implementation by @tienthanh214 (#25).
- Logging System Migration: Replaced the standard logging module with loguru to improve readability, flexibility, and ease of debugging.
- Demo Video Update: Updated the demo video in README.md to reflect the latest features and behavior of the current codebase.
v1.0.8
v1.0.7
Summary of Changes
- Temporarily removed metrics.py pending a better implementation.
- Aligned all Pydantic model fields with the OpenAI schema for consistency.
- Updated finish_reason from "function_call" to "tool_calls" in streaming responses involving tool usage.
- Fixed parsing of tool calls for Qwen3 models.
- Resolved issues with VLM models when handling non-streaming text requests.
- Added torchvision to
setup.py
— required by some VLM models for image processing.
v1.0.6
Summary
This new release tag to mark the latest stable version of the codebase. Key updates included in this release:
- New Feature: Introduced the /v1/models endpoint for monitoring model serving status.
- Updates: Synced with the latest versions of mlx_vlm and mlx_lm for up-to-date performance and compatibility.
- Bug Fix: Fixed a text extraction issue when processing chunks.
- Enhancement: Refined the resource cleanup logic for improved efficiency and stability.
v1.0.5
BREAKING CHANGE: Rename package and CLI from mlx-server to mlx-openai-server
Summary
This PR introduces a breaking change by renaming the package and CLI from mlx-server
to mlx-openai-server
to resolve PyPI naming conflicts and improve compatibility.
Changes
- Renamed the package in
setup.py
frommlx-server
tomlx-openai-server
- Updated all CLI references from
mlx-server
tomlx-openai-server
- Updated the repository/package name in the README and all usage instructions
- Added a "Breaking Change" notice to the README
Impact
Breaking change:
All users must update their scripts, CLI commands, and installation instructions to use mlx-openai-server
instead of mlx-server
.
Motivation
The original name mlx-server
was too similar to existing projects on PyPI, causing upload errors. This change ensures uniqueness and future compatibility.
Migration
- Replace all usage of
mlx-server
withmlx-openai-server
in your scripts and CLI commands. - Update any installation instructions to use the new package name.
v1.0.4
v1.0.3
Key Enhancements in This Release:
- Function Calling Support for Qwen3: Added support for function calling with the Qwen3 model, the latest addition to Alibaba Model Studio’s Qwen family of large language models.
- Embeddings Endpoint: Introduced the v1/embeddings endpoint for both LM and VLM, enabling embedding generation.
- Improved OpenAI Compatibility: Refactored the schema to enhance compatibility with OpenAI’s API format.
- RAG Demo Notebook: Added simple_rag_demo.ipynb, showcasing an engaging use case for serving local models using mlx-server-OAI-compat.
- Consistent Error Handling: Standardized all error responses across the codebase using create_error_response for a unified API error format.
v1.0.2
Changes in this release:
- Refactored API schemas and response formats for improved consistency and maintainability.
- Updated chat history handling logic for better performance and reliability.
- Exposed the /v1/embeddings endpoint to support MLX-LM models (text-only).
- Added a new notebook, embeddings_examples, demonstrating how to use the embeddings endpoint via the OpenAI-compatible API.