Added the ability to call openai compatible api #14

hungrymonkey · 2025-08-29T14:27:04Z

go test -bench=. -timeout=0

time=2025-08-29T10:22:58.423-04:00 level=INFO msg="Inserting file" rag=light path=docs/christmascarol.txt 
time=2025-08-29T10:22:58.524-04:00 level=INFO msg="Upserting sources" rag=light package=golightrag function=Insert count=8 time=2025-08-29T10:22:58.533-04:00 level=INFO msg="Extracting entities" rag=light package=golightrag function=Insert count=8 time=2025-08-29T10:23:25.934-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=1 error="failed to parse llm result: invalid character '<' looking for beginning of value" time=2025-08-29T10:23:52.162-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=1 error="failed to parse llm result: invalid character '<' looking for beginning of value"
 time=2025-08-29T10:23:58.534-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=1 error="failed to call LLM: error sending request: Post \"http://localhost:1234/v1/chat/completions\": context deadline exceeded" time=2025-08-29T10:23:58.534-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=1 error="failed to call LLM: error sending request: Post \"http://localhost:1234/v1/chat/completions\": context deadline exceeded" time=2025-08-29T10:23:58.534-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=1 error="failed to call LLM: error sending request: Post \"http://localhost:1234/v1/chat/completions\": context deadline exceeded" 
time=2025-08-29T10:24:21.802-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=2 error="failed to parse llm result: invalid character '<' looking for beginning of value" 
time=2025-08-29T10:24:34.413-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=2 error="failed to parse llm result: invalid character '<' looking for beginning of value" 
time=2025-08-29T10:24:43.889-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=2 error="failed to parse llm result: invalid character '<' looking for beginning of value" 
time=2025-08-29T10:24:55.717-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=2 error="failed to parse llm result: invalid character '<' looking for beginning of value" 
time=2025-08-29T10:25:01.537-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=2 error="failed to call LLM: error sending request: Post \"http://localhost:1234/v1/chat/completions\": context deadline exceeded" time=2025-08-29T10:25:14.906-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=3 error="failed to parse llm result: invalid character '<' looking for beginning of value"

Tested with this config on lm-studio

docker run -p 7687:7687 -p 7474:7474 -e NEO4J_AUTH=neo4j/password neo4j:latest

$cat config.yaml
neo4j_uri: "bolt://localhost:7687"
neo4j_user: "neo4j"
neo4j_password: "password"

rag_llm:
  type: "openai-compat"  # Options: openai, openai-compat, anthropic, ollama, openrouter
  api_key: "your-openai-api-key-here"
  host: "http://localhost:1234/v1/"
  model: "qwen3-0.6b-mlx"
  parameters:
    temperature: 0.7

eval_llm:
  type: "openai-compat"  # Options: openai, openai-compat, anthropic, ollama, openrouter
  api_key: "your-openai-api-key-here"
  host: "http://localhost:1234/v1/"
  model: "qwen3-0.6b-mlx"
  parameters:
    temperature: 0.7

embedding_api_key: "your-openai-api-key-here"

log_level: "info"  # Options: debug, info, warn, error

go test -bench=. -timeout=0 go test -bench=. -timeout=0 time=2025-08-29T10:22:58.423-04:00 level=INFO msg="Inserting file" rag=light path=docs/christmascarol.txt time=2025-08-29T10:22:58.524-04:00 level=INFO msg="Upserting sources" rag=light package=golightrag function=Insert count=8 time=2025-08-29T10:22:58.533-04:00 level=INFO msg="Extracting entities" rag=light package=golightrag function=Insert count=8 time=2025-08-29T10:23:25.934-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=1 error="failed to parse llm result: invalid character '<' looking for beginning of value" time=2025-08-29T10:23:52.162-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=1 error="failed to parse llm result: invalid character '<' looking for beginning of value" time=2025-08-29T10:23:58.534-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=1 error="failed to call LLM: error sending request: Post \"http://localhost:1234/v1/chat/completions\": context deadline exceeded" time=2025-08-29T10:23:58.534-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=1 error="failed to call LLM: error sending request: Post \"http://localhost:1234/v1/chat/completions\": context deadline exceeded" time=2025-08-29T10:23:58.534-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=1 error="failed to call LLM: error sending request: Post \"http://localhost:1234/v1/chat/completions\": context deadline exceeded" time=2025-08-29T10:24:21.802-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=2 error="failed to parse llm result: invalid character '<' looking for beginning of value" time=2025-08-29T10:24:34.413-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=2 error="failed to parse llm result: invalid character '<' looking for beginning of value" time=2025-08-29T10:24:43.889-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=2 error="failed to parse llm result: invalid character '<' looking for beginning of value" time=2025-08-29T10:24:55.717-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=2 error="failed to parse llm result: invalid character '<' looking for beginning of value" time=2025-08-29T10:25:01.537-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=2 error="failed to call LLM: error sending request: Post \"http://localhost:1234/v1/chat/completions\": context deadline exceeded" time=2025-08-29T10:25:14.906-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=3 error="failed to parse llm result: invalid character '<' looking for beginning of value" Tested with this config on lm-studio docker run -p 7687:7687 -p 7474:7474 -e NEO4J_AUTH=neo4j/password neo4j:latest cat config.yaml neo4j_uri: "bolt://localhost:7687" neo4j_user: "neo4j" neo4j_password: "password" rag_llm: type: "openai-compat" # Options: openai, openai-compat, anthropic, ollama, openrouter api_key: "your-openai-api-key-here" host: "http://localhost:1234/v1/" model: "qwen3-0.6b-mlx" parameters: temperature: 0.7 eval_llm: type: "openai-compat" # Options: openai, openai-compat, anthropic, ollama, openrouter api_key: "your-openai-api-key-here" host: "http://localhost:1234/v1/" model: "qwen3-0.6b-mlx" parameters: temperature: 0.7 embedding_api_key: "your-openai-api-key-here" log_level: "info" # Options: debug, info, warn, error

hungrymonkey · 2025-08-29T14:30:47Z

I attempted to run embedding models on rag_llm

time=2025-08-29T10:20:11.274-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=3 error="failed to call LLM: error sending request: error, status code: 404, status: 404 Not Found, message: Failed to load model \"kolosal/qwen3-embedding-0.6b\". Error: Model is not llm."
time=2025-08-29T10:20:12.319-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=3 error="failed to call LLM: error sending request: error, status code: 404, status: 404 Not Found, message: Failed to load model \"kolosal/qwen3-embedding-0.6b\". Error: Model is not llm."
time=2025-08-29T10:20:13.113-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=3 error="failed to call LLM: error sending request: error, status code: 404, status: 404 Not Found, message: Failed to load model \"kolosal/qwen3-embedding-0.6b\". Error: Model is not llm."

I am not sure how the unittests works.

hungrymonkey · 2025-08-29T14:31:51Z

My machine isn't fast enough to run the unit tests.

hungrymonkey · 2025-08-29T14:35:54Z

lm-studio logs

qwen3-0.6b-mlx] Generated prediction:  {
  "id": "chatcmpl-4s14euxkw1hqus2n31sk6s",
  "object": "chat.completion",
  "created": 1756477751,
  "model": "qwen3-0.6b-mlx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>\nOkay, let's see. The user provided a text document and wants me to extract entities of specified types from it. The entity_types are [character, organization, location, time period, object, theme, event]. \n\nFirst, I need to scan through the given text. The text starts with an HTML table that lists some links and metadata about an eBook. But looking at the actual content, I don't see any text related to entities like person names, organizations, locations, etc. \n\nWait, maybe there's a typo in the user's input? The provided text seems to be just metadata about an eBook, not a document with content. So there are no entities in the text to process according to the instructions. \n\nBut perhaps I should check again. The user's example outputs have entities like \"Alex\" and \"Taylor\". However, in the given text",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 4070,
    "completion_tokens": 178,
    "total_tokens": 4248
  },
  "stats": {},
  "system_fingerprint": "qwen3-0.6b-mlx"
}
2025-08-29 10:35:17  [INFO]
 [LM STUDIO SERVER] Running chat completion on conversation with 1 messages.
2025-08-29 10:35:17  [INFO]
 [LM STUDIO SERVER] Running chat completion on conversation with 1 messages.
2025-08-29 10:35:17  [INFO]
 [LM STUDIO SERVER] Running chat completion on conversation with 1 messages.
2025-08-29 10:35:17  [INFO]
 [LM STUDIO SERVER] Running chat completion on conversation with 1 messages.
2025-08-29 10:35:17  [INFO]
 [LM STUDIO SERVER] Running chat completion on conversation with 1 messages.
2025-08-29 10:35:18  [INFO]
 [LM STUDIO SERVER] Client disconnected. Stopping generation... (If the model is busy processing the prompt, it will finish first.)
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Model generated tool calls:  []
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Generated prediction:  {
  "id": "chatcmpl-k66f5h4i4mioiwcqbnttqk",
  "object": "chat.completion",
  "created": 1756478117,
  "model": "qwen3-0.6b-mlx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {},
  "stats": {},
  "system_fingerprint": "qwen3-0.6b-mlx"
}
2025-08-29 10:35:18  [INFO]
 [LM STUDIO SERVER] Client disconnected. Stopping generation... (If the model is busy processing the prompt, it will finish first.)
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Model generated tool calls:  []
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Generated prediction:  {
  "id": "chatcmpl-l7fb4w4itci3wialebm3",
  "object": "chat.completion",
  "created": 1756478117,
  "model": "qwen3-0.6b-mlx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {},
  "stats": {},
  "system_fingerprint": "qwen3-0.6b-mlx"
}
2025-08-29 10:35:18  [INFO]
 [LM STUDIO SERVER] Client disconnected. Stopping generation... (If the model is busy processing the prompt, it will finish first.)
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Model generated tool calls:  []
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Generated prediction:  {
  "id": "chatcmpl-dn6xn1rx195w2pwpxsks1a",
  "object": "chat.completion",
  "created": 1756478117,
  "model": "qwen3-0.6b-mlx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {},
  "stats": {},
  "system_fingerprint": "qwen3-0.6b-mlx"
}
2025-08-29 10:35:18  [INFO]
 [LM STUDIO SERVER] Client disconnected. Stopping generation... (If the model is busy processing the prompt, it will finish first.)
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Model generated tool calls:  []
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Generated prediction:  {
  "id": "chatcmpl-3vah0ksxtsxkznu6m7uzr",
  "object": "chat.completion",
  "created": 1756478117,
  "model": "qwen3-0.6b-mlx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {},
  "stats": {},
  "system_fingerprint": "qwen3-0.6b-mlx"
}
2025-08-29 10:35:18  [INFO]
 [LM STUDIO SERVER] Client disconnected. Stopping generation... (If the model is busy processing the prompt, it will finish first.)
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Model generated tool calls:  []
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Generated prediction:  {
  "id": "chatcmpl-2itr1egtpdezidzlj0et68",
  "object": "chat.completion",
  "created": 1756478117,
  "model": "qwen3-0.6b-mlx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>\nOkay, let's tackle this problem. The user provided a text document and a list of entity types",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 4083,
    "completion_tokens": 23,
    "total_tokens": 4106
  },
  "stats": {},
  "system_fingerprint": "qwen3-0.6b-mlx"
}

hungrymonkey · 2025-08-29T14:53:23Z

The program is thinking

sashabaranov/go-openai#980

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-0.6b-mlx",
    "messages": [
      { "role": "system", "content": "Always answer in rhymes. Today is Thursday" },
      { "role": "user", "content": "What day is it today?" }
    ],
    "temperature": 0.7,
    "max_tokens": -1,
    "stream": false
}'
{
  "id": "chatcmpl-x4nfc347sskej459b0xe6s",
  "object": "chat.completion",
  "created": 1756479156,
  "model": "qwen3-0.6b-mlx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>\nOkay, the user asked \"Today is Thursday\" and then continued with a question about what day it's today. But the system says to always answer in rhymes. Let me check again.\n\nWait, maybe there's a misunderstanding here. The original query was \"Today is Thursday,\" and the user then asked, \"What day is it today?\" So they made a mistake in their question. But I need to respond in rhymes. Let me try that.\n\nToday is Thursday, the day we're all at home. We come in and go out for a while. Maybe add something like \"The day is bright, we're so happy.\" That's rhyming. Let me make sure it fits together. Yeah, that works.\n</think>\n\nToday is Thursday! The day we’re all at home,  \nWe come in and go out with our friends.  \nThe sky’s blue and the sun is up,  \nAnd we’re all laughing in our days.",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 198,
    "total_tokens": 226
  },
  "stats": {},
  "system_fingerprint": "qwen3-0.6b-mlx"
}%

I cannot turn off think

hungrymonkey · 2025-08-29T14:54:23Z

curl http://localhost:1234/v1/chat/completions
-H "Content-Type: application/json"
-d '{
"model": "qwen3-0.6b-mlx",
"messages": [
{ "role": "system", "content": "Always answer in rhymes. Today is Thursday" },
{ "role": "user", "content": "What day is it today?" }
],
"temperature": 0.7,
"max_tokens": -1,
"stream": false,
"enable_think": false
}'
{
"id": "chatcmpl-rg98cl9p6ypbolnz1j94md",
"object": "chat.completion",
"created": 1756479248,
"model": "qwen3-0.6b-mlx",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "\nOkay, the user asked, "What day is it today?" and I need to respond in rhymes. Let me start by thinking about the current date. Since today is Thursday, I need to make a rhyme with that.\n\nThursday rhymes with... Thursday? Maybe not. Let me think of another word. What about "Tuesday" and "Wednesday"? But the user asked for today's date, so maybe focus on Thursday. \n\nI should make a four-syllable rhyme pair. Let me try: "Thursday, Thursday, the day we're here." Then maybe add a second line that rhymes with "Thursday." \n\n"Today's the day we're here, and it's blue as a sky." Wait, "blue as a sky" rhymes with "Thursday." That works. Let me check the syllables: Thursday (4), Thursday (4), today's... (3). That might work. \n\nAlternatively, could I make it more concise? Like "Thursday, Thursday, today's the day we're here." Then the second line could be "And it's blue as a sky." That seems good. I think that's the rhyming couplet.\n\n\n"Thursday, Thursday, today's the day we're here. \nAnd it's blue as a sky, and bright like a ray."",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 274,
"total_tokens": 302
},
"stats": {},
"system_fingerprint": "qwen3-0.6b-mlx"
}%

hungrymonkey · 2025-08-29T14:55:53Z

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-0.6b-mlx",
    "messages": [
      { "role": "system", "content": "Always answer in rhymes. Today is Thursday" },
      { "role": "user", "content": "What day is it today?" }
    ],
    "temperature": 0.7,
    "max_tokens": -1,
    "stream": false,
  "chat_template_kwargs": {"enable_thinking": false}
}'
{
  "id": "chatcmpl-1cen9zxnvi285o7576lk4v",
  "object": "chat.completion",
  "created": 1756479339,
  "model": "qwen3-0.6b-mlx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>\nOkay, the user asked, \"Today is Thursday.\" But the system responded with \"What day is it today?\" which is a follow-up. I need to adjust my response to make sure it's both helpful and in rhymes.\n\nFirst, the user is confirming today's date. The original answer was about Thursday. So I should respond directly to their confirmation, maybe using a rhyme with \"Thursday\" and something related. Let me think of rhyming words. Maybe start with \"Today is Thursday!\" and then add a second line that's in rhymes.\n\nPossible lines: \"Today is Thursday, the day of the week!\" and \"And so I'll make it a happy day!\" That works. Let me check the rhyme scheme. \"Thursday\" and \"week\" rhyme? Not exactly, but maybe adjust. Alternatively, \"Today is Thursday!\" and \"And so the day's done!\" That could work too. Both lines rhyme, making it easy to follow.\n</think>\n\nToday is Thursday!  \nAnd so I'll make it a happy day!",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 217,
    "total_tokens": 245
  },
  "stats": {},
  "system_fingerprint": "qwen3-0.6b-mlx"
}

hungrymonkey · 2025-08-30T10:59:47Z

Turning off think requires a library upgrade

hungrymonkey@a71f49b

markdown ast

hungrymonkey pushed a commit to hungrymonkey/go-light-rag that referenced this pull request Sep 4, 2025

Merge pull request MegaGrindStone#14 from soundprediction/openaicompat

ff96c29

markdown ast

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added the ability to call openai compatible api #14

Added the ability to call openai compatible api #14

Uh oh!

hungrymonkey commented Aug 29, 2025 •

edited

Loading

Uh oh!

hungrymonkey commented Aug 29, 2025

Uh oh!

hungrymonkey commented Aug 29, 2025

Uh oh!

hungrymonkey commented Aug 29, 2025

Uh oh!

hungrymonkey commented Aug 29, 2025

Uh oh!

hungrymonkey commented Aug 29, 2025

Uh oh!

hungrymonkey commented Aug 29, 2025

Uh oh!

hungrymonkey commented Aug 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Added the ability to call openai compatible api #14

Are you sure you want to change the base?

Added the ability to call openai compatible api #14

Uh oh!

Conversation

hungrymonkey commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hungrymonkey commented Aug 29, 2025

Uh oh!

hungrymonkey commented Aug 29, 2025

Uh oh!

hungrymonkey commented Aug 29, 2025

Uh oh!

hungrymonkey commented Aug 29, 2025

Uh oh!

hungrymonkey commented Aug 29, 2025

Uh oh!

hungrymonkey commented Aug 29, 2025

Uh oh!

hungrymonkey commented Aug 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hungrymonkey commented Aug 29, 2025 •

edited

Loading