This project is a lightweight REST API server built using FastAPI that receives binary data from a file, converts it to Markdown format using the MarkItDown library, and returns the Markdown content.
Important
This project started as a fork of elbruno/MarkItDownServer.
Note
This project uses uv
for dependency management and multistage Docker builds, significantly reducing build times and final image size.
The easiest way to get started is to use the pre-built image from GitHub Container Registry:
docker pull ghcr.io/dezoito/markitdown-api:latest
docker run -d --name markitdown-api -p 8490:8490 ghcr.io/dezoito/markitdown-api:latest
-
Clone the repository:
git clone <repository-url>
-
Navigate to the project directory:
cd <project-dir>
-
Build the docker image
docker build -t markitdown-api:latest .
-
Run the docker container
docker run -d --name markitdown-api -p 8490:8490 markitdown-api:latest
For easier development, a convenience script is included to rebuild the image and restart the container:
-
Make the script executable:
chmod +x rebuild.sh
-
Run the script whenever you make changes:
./rebuild.sh
The script will:
- Stop the running container
- Remove the container
- Build a fresh image
- Start a new container
- Verify the container is running
This simplifies the development process when you're making frequent changes to the codebase.
The API offers two main endpoints:
Provides an interactive documentation interface where you can:
- Read and explore the existing API endpoints
- View request/response schemas and examples
Accepts a POST request containing a file to convert to markdown.
- Method: POST
- Content-Type: multipart/form-data
- Parameter: file (binary)
- Accepted file types: doc, docx, ppt, pptx, pdf, xls, xlsx, txt, csv, json
- Returns: JSON object with the converted markdown content
For more information regarding valid file types, check the official MarkItDown project.
You can quickly test that the application is running by uploading a file via curl
, like so:
curl -X POST -F "file=@path/to/mypdf.pdf" http://localhost:8490/process_file
The result should be a string encoding a JSON object like:
{ "markdown": "Your content written in markdown..." }
Here's a very simple example in Python:
import requests
file_path = "/path/to/my.pdf"
with open(file_path, 'rb') as file:
# Prepare the file for the multipart/form-data request
files = {'file': (file_path, file)}
# Make the POST request to the API
response = requests.post("http://localhost:8490/process_file", files=files)
# Parse the JSON response
result = response.json()
# Return the markdown content
content = result.get('markdown')
This project was originally based on elbruno/MarkItDownServer by Bruno Capuano.
This project is licensed under the MIT License.