MLX INFERENCE is an OpenAI API compatible inference service based on MLX-LM and MLX-VLM, providing the following endpoints:
/v1/chat/completions
- Chat completion interface/v1/responses
- Response interface/v1/models
- Get available model list
pip install -r requirements.txt
# Copy environment file
cp .env.example .env
Execute in project root directory:
uvicorn mlx_Inference:app --workers 1 --port 8002
Parameters:
--workers
: Number of worker processes--port
: Service port number
- Compatible with OpenAI API specifications
- Backend inference uses MLX-LM and MLX-VLM, supports mlx-community models
- Easy to deploy and use