Skip to content

curl scripts llama swap

Benson Wong edited this page Jul 24, 2025 · 2 revisions

About

  • examples using curl to test llama-swap endpoints
  • change the IP address/port for your local setup

/v1/chat/completions

non-streaming

curl -s http://10.0.1.50:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model":"llama-8B",
         "max_tokens":200, 
         "messages": [{"role": "user","content": "write a story about puppies"}]
        }' 

streaming

curl -sn http://10.0.1.50:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model":"llama-8B",
         "max_tokens":200, 
         "messages": [{"role": "user","content": "write a story about puppies"}]
        }' 

streaming with jq filtering of token stream

curl -sn http://10.0.1.50:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model":"devstral","max_tokens":200, "timings_per_token":true, "stream":true, "messages": [{"role": "user","content": "write hello world in golang"}]}' \
   | jq -cR 'sub("^data: "; "") | fromjson? | {c: .choices[0].delta.content, tps: .timings.predicted_per_second}'

/v1/embeddings

curl 10.0.1.50:8080/v1/embeddings \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"model": "embedding", "input": "the text to embed"}'

/v1/rerank, /v1/reranking, /rerank

curl -s http://10.0.1.50:8080/v1/rerank \
  -H 'Content-Type: application/json' \
  -d '{
        "model": "reranker",
        "query": "What is the best way to learn Python?",
        "documents": [
          "Python is a popular programming language used for web development and data analysis.",
          "The best way to learn Python is through online courses and practice.",
          "Python is also used for artificial intelligence and machine learning applications.",
          "To learn Python, start with the basics and build small projects to gain experience."
        ],
        "max_reranked": 2
      }' | jq .;

/v1/embeddings

curl -s 10.0.1.50:8080/v1/embeddings \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"model": "embedding", "input": "the text to embed"}';

/v1/audio/transcriptions

Notes:

  • @jfk.wav is from whisper.cpp's samples.
  • the /v1/audio/transcriptions endpoint is a POST multi-part form (not a JSON body)
curl -s 10.0.1.50:8080/v1/audio/transcriptions \
    -H "Content-Type: multipart/form-data" \
    -F file="@jfk.wav" \
    -F temperature="0.0" \
    -F temperature_inc="0.2" \
    -F response_format="json" \
    -F model="whisper";
Clone this wiki locally