curl scripts llama swap

About

examples using curl to test llama-swap endpoints
change the IP address/port for your local setup

/v1/chat/completions

non-streaming

curl -s http://10.0.1.50:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model":"llama-8B",
         "max_tokens":200, 
         "messages": [{"role": "user","content": "write a story about puppies"}]
        }'

streaming

curl -sn http://10.0.1.50:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model":"llama-8B",
         "max_tokens":200, 
         "messages": [{"role": "user","content": "write a story about puppies"}]
        }'

streaming with `jq` filtering of token stream

curl -sn http://10.0.1.50:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model":"devstral","max_tokens":200, "timings_per_token":true, "stream":true, "messages": [{"role": "user","content": "write hello world in golang"}]}' \
   | jq -cR 'sub("^data: "; "") | fromjson? | {c: .choices[0].delta.content, tps: .timings.predicted_per_second}'

/v1/embeddings

curl 10.0.1.50:8080/v1/embeddings \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"model": "embedding", "input": "the text to embed"}'

/v1/rerank, /v1/reranking, /rerank

curl -s http://10.0.1.50:8080/v1/rerank \
  -H 'Content-Type: application/json' \
  -d '{
        "model": "reranker",
        "query": "What is the best way to learn Python?",
        "documents": [
          "Python is a popular programming language used for web development and data analysis.",
          "The best way to learn Python is through online courses and practice.",
          "Python is also used for artificial intelligence and machine learning applications.",
          "To learn Python, start with the basics and build small projects to gain experience."
        ],
        "max_reranked": 2
      }' | jq .;

/v1/embeddings

curl -s 10.0.1.50:8080/v1/embeddings \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"model": "embedding", "input": "the text to embed"}';

/v1/audio/transcriptions

Notes:

@jfk.wav is from whisper.cpp's samples.
the /v1/audio/transcriptions endpoint is a POST multi-part form (not a JSON body)

curl -s 10.0.1.50:8080/v1/audio/transcriptions \
    -H "Content-Type: multipart/form-data" \
    -F file="@jfk.wav" \
    -F temperature="0.0" \
    -F temperature_inc="0.2" \
    -F response_format="json" \
    -F model="whisper";

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

curl scripts llama swap

About

/v1/chat/completions

non-streaming

streaming

streaming with `jq` filtering of token stream

/v1/embeddings

/v1/rerank, /v1/reranking, /rerank

/v1/embeddings

/v1/audio/transcriptions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

curl scripts llama swap

About

/v1/chat/completions

non-streaming

streaming

streaming with jq filtering of token stream

/v1/embeddings

/v1/rerank, /v1/reranking, /rerank

/v1/embeddings

/v1/audio/transcriptions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

streaming with `jq` filtering of token stream