Skip to content

Releases: cubist38/mlx-openai-server

v1.0.11

12 Jun 06:17
Compare
Choose a tag to compare

Release Summary

  1. Updated README.md to streamline content and enhance clarity for easier understanding.
  2. Merged Pull Request #27 (contributed by @koush) to remove an unused dependency, reducing potential bloat and improving maintainability.

v1.0.10

09 Jun 04:49
Compare
Choose a tag to compare

Summary of Changes

  • Updated to the latest versions of mlx-lm and mlx-vlm to ensure compatibility and access to recent improvements.
  • Optimized default image resolution from 512 to 448 for improved performance.

v1.0.9

31 May 15:23
Compare
Choose a tag to compare

Summary of Changes

  • Implemented OpenAI-Standard Streaming with Tool Calls and Thinking Parser: Fully integrated streaming support for tool calls, including a custom thinking parser, in compliance with OpenAI’s latest API standards. Implementation by @tienthanh214 (#25).
  • Logging System Migration: Replaced the standard logging module with loguru to improve readability, flexibility, and ease of debugging.
  • Demo Video Update: Updated the demo video in README.md to reflect the latest features and behavior of the current codebase.

v1.0.8

24 May 15:25
Compare
Choose a tag to compare

Craft a concise and informative description for PyPI package. The description highlights the key features and purpose of mlx-openai-server package while being clear and engaging.

v1.0.7

24 May 15:15
1ad17af
Compare
Choose a tag to compare

Summary of Changes

  • Temporarily removed metrics.py pending a better implementation.
  • Aligned all Pydantic model fields with the OpenAI schema for consistency.
  • Updated finish_reason from "function_call" to "tool_calls" in streaming responses involving tool usage.
  • Fixed parsing of tool calls for Qwen3 models.
  • Resolved issues with VLM models when handling non-streaming text requests.
  • Added torchvision to setup.py — required by some VLM models for image processing.

v1.0.6

20 May 15:30
35b7818
Compare
Choose a tag to compare

Summary

This new release tag to mark the latest stable version of the codebase. Key updates included in this release:

  • New Feature: Introduced the /v1/models endpoint for monitoring model serving status.
  • Updates: Synced with the latest versions of mlx_vlm and mlx_lm for up-to-date performance and compatibility.
  • Bug Fix: Fixed a text extraction issue when processing chunks.
  • Enhancement: Refined the resource cleanup logic for improved efficiency and stability.

v1.0.5

13 May 09:59
Compare
Choose a tag to compare

BREAKING CHANGE: Rename package and CLI from mlx-server to mlx-openai-server

Summary

This PR introduces a breaking change by renaming the package and CLI from mlx-server to mlx-openai-server to resolve PyPI naming conflicts and improve compatibility.

Changes

  • Renamed the package in setup.py from mlx-server to mlx-openai-server
  • Updated all CLI references from mlx-server to mlx-openai-server
  • Updated the repository/package name in the README and all usage instructions
  • Added a "Breaking Change" notice to the README

Impact

Breaking change:
All users must update their scripts, CLI commands, and installation instructions to use mlx-openai-server instead of mlx-server.

Motivation

The original name mlx-server was too similar to existing projects on PyPI, causing upload errors. This change ensures uniqueness and future compatibility.

Migration

  • Replace all usage of mlx-server with mlx-openai-server in your scripts and CLI commands.
  • Update any installation instructions to use the new package name.

v1.0.4

13 May 09:57
Compare
Choose a tag to compare

Automating Package Releases with GitHub Action

v1.0.3

04 May 09:49
Compare
Choose a tag to compare

Key Enhancements in This Release:

  • Function Calling Support for Qwen3: Added support for function calling with the Qwen3 model, the latest addition to Alibaba Model Studio’s Qwen family of large language models.
  • Embeddings Endpoint: Introduced the v1/embeddings endpoint for both LM and VLM, enabling embedding generation.
  • Improved OpenAI Compatibility: Refactored the schema to enhance compatibility with OpenAI’s API format.
  • RAG Demo Notebook: Added simple_rag_demo.ipynb, showcasing an engaging use case for serving local models using mlx-server-OAI-compat.
  • Consistent Error Handling: Standardized all error responses across the codebase using create_error_response for a unified API error format.

v1.0.2

20 Apr 13:45
d34536f
Compare
Choose a tag to compare

Changes in this release:

  • Refactored API schemas and response formats for improved consistency and maintainability.
  • Updated chat history handling logic for better performance and reliability.
  • Exposed the /v1/embeddings endpoint to support MLX-LM models (text-only).
  • Added a new notebook, embeddings_examples, demonstrating how to use the embeddings endpoint via the OpenAI-compatible API.