Skip to content

Conversation

@hungrymonkey
Copy link

@hungrymonkey hungrymonkey commented Aug 29, 2025

go test -bench=. -timeout=0

time=2025-08-29T10:22:58.423-04:00 level=INFO msg="Inserting file" rag=light path=docs/christmascarol.txt 
time=2025-08-29T10:22:58.524-04:00 level=INFO msg="Upserting sources" rag=light package=golightrag function=Insert count=8 time=2025-08-29T10:22:58.533-04:00 level=INFO msg="Extracting entities" rag=light package=golightrag function=Insert count=8 time=2025-08-29T10:23:25.934-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=1 error="failed to parse llm result: invalid character '<' looking for beginning of value" time=2025-08-29T10:23:52.162-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=1 error="failed to parse llm result: invalid character '<' looking for beginning of value"
 time=2025-08-29T10:23:58.534-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=1 error="failed to call LLM: error sending request: Post \"http://localhost:1234/v1/chat/completions\": context deadline exceeded" time=2025-08-29T10:23:58.534-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=1 error="failed to call LLM: error sending request: Post \"http://localhost:1234/v1/chat/completions\": context deadline exceeded" time=2025-08-29T10:23:58.534-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=1 error="failed to call LLM: error sending request: Post \"http://localhost:1234/v1/chat/completions\": context deadline exceeded" 
time=2025-08-29T10:24:21.802-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=2 error="failed to parse llm result: invalid character '<' looking for beginning of value" 
time=2025-08-29T10:24:34.413-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=2 error="failed to parse llm result: invalid character '<' looking for beginning of value" 
time=2025-08-29T10:24:43.889-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=2 error="failed to parse llm result: invalid character '<' looking for beginning of value" 
time=2025-08-29T10:24:55.717-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=2 error="failed to parse llm result: invalid character '<' looking for beginning of value" 
time=2025-08-29T10:25:01.537-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=2 error="failed to call LLM: error sending request: Post \"http://localhost:1234/v1/chat/completions\": context deadline exceeded" time=2025-08-29T10:25:14.906-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=3 error="failed to parse llm result: invalid character '<' looking for beginning of value"

Tested with this config on lm-studio

docker run -p 7687:7687 -p 7474:7474 -e NEO4J_AUTH=neo4j/password neo4j:latest

$cat config.yaml
neo4j_uri: "bolt://localhost:7687"
neo4j_user: "neo4j"
neo4j_password: "password"

rag_llm:
  type: "openai-compat"  # Options: openai, openai-compat, anthropic, ollama, openrouter
  api_key: "your-openai-api-key-here"
  host: "http://localhost:1234/v1/"
  model: "qwen3-0.6b-mlx"
  parameters:
    temperature: 0.7

eval_llm:
  type: "openai-compat"  # Options: openai, openai-compat, anthropic, ollama, openrouter
  api_key: "your-openai-api-key-here"
  host: "http://localhost:1234/v1/"
  model: "qwen3-0.6b-mlx"
  parameters:
    temperature: 0.7

embedding_api_key: "your-openai-api-key-here"

log_level: "info"  # Options: debug, info, warn, error

 go test -bench=. -timeout=0

go test -bench=. -timeout=0

time=2025-08-29T10:22:58.423-04:00 level=INFO msg="Inserting file" rag=light path=docs/christmascarol.txt
time=2025-08-29T10:22:58.524-04:00 level=INFO msg="Upserting sources" rag=light package=golightrag function=Insert count=8
time=2025-08-29T10:22:58.533-04:00 level=INFO msg="Extracting entities" rag=light package=golightrag function=Insert count=8
time=2025-08-29T10:23:25.934-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=1 error="failed to parse llm result: invalid character '<' looking for beginning of value"
time=2025-08-29T10:23:52.162-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=1 error="failed to parse llm result: invalid character '<' looking for beginning of value"
time=2025-08-29T10:23:58.534-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=1 error="failed to call LLM: error sending request: Post \"http://localhost:1234/v1/chat/completions\": context deadline exceeded"
time=2025-08-29T10:23:58.534-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=1 error="failed to call LLM: error sending request: Post \"http://localhost:1234/v1/chat/completions\": context deadline exceeded"
time=2025-08-29T10:23:58.534-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=1 error="failed to call LLM: error sending request: Post \"http://localhost:1234/v1/chat/completions\": context deadline exceeded"
time=2025-08-29T10:24:21.802-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=2 error="failed to parse llm result: invalid character '<' looking for beginning of value"
time=2025-08-29T10:24:34.413-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=2 error="failed to parse llm result: invalid character '<' looking for beginning of value"
time=2025-08-29T10:24:43.889-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=2 error="failed to parse llm result: invalid character '<' looking for beginning of value"
time=2025-08-29T10:24:55.717-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=2 error="failed to parse llm result: invalid character '<' looking for beginning of value"
time=2025-08-29T10:25:01.537-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=2 error="failed to call LLM: error sending request: Post \"http://localhost:1234/v1/chat/completions\": context deadline exceeded"
time=2025-08-29T10:25:14.906-04:00 level=WARN msg="Retry parse result" rag=light package=golightrag function=Insert retry=3 error="failed to parse llm result: invalid character '<' looking for beginning of value"

Tested with this config on lm-studio

docker run -p 7687:7687 -p 7474:7474 -e NEO4J_AUTH=neo4j/password neo4j:latest

cat config.yaml
neo4j_uri: "bolt://localhost:7687"
neo4j_user: "neo4j"
neo4j_password: "password"

rag_llm:
  type: "openai-compat"  # Options: openai, openai-compat, anthropic, ollama, openrouter
  api_key: "your-openai-api-key-here"
  host: "http://localhost:1234/v1/"
  model: "qwen3-0.6b-mlx"
  parameters:
    temperature: 0.7

eval_llm:
  type: "openai-compat"  # Options: openai, openai-compat, anthropic, ollama, openrouter
  api_key: "your-openai-api-key-here"
  host: "http://localhost:1234/v1/"
  model: "qwen3-0.6b-mlx"
  parameters:
    temperature: 0.7

embedding_api_key: "your-openai-api-key-here"

log_level: "info"  # Options: debug, info, warn, error
@hungrymonkey
Copy link
Author

I attempted to run embedding models on rag_llm

time=2025-08-29T10:20:11.274-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=3 error="failed to call LLM: error sending request: error, status code: 404, status: 404 Not Found, message: Failed to load model \"kolosal/qwen3-embedding-0.6b\". Error: Model is not llm."
time=2025-08-29T10:20:12.319-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=3 error="failed to call LLM: error sending request: error, status code: 404, status: 404 Not Found, message: Failed to load model \"kolosal/qwen3-embedding-0.6b\". Error: Model is not llm."
time=2025-08-29T10:20:13.113-04:00 level=WARN msg="Retry extract" rag=light package=golightrag function=Insert retry=3 error="failed to call LLM: error sending request: error, status code: 404, status: 404 Not Found, message: Failed to load model \"kolosal/qwen3-embedding-0.6b\". Error: Model is not llm."

I am not sure how the unittests works.

@hungrymonkey
Copy link
Author

My machine isn't fast enough to run the unit tests.

@hungrymonkey
Copy link
Author

lm-studio logs

qwen3-0.6b-mlx] Generated prediction:  {
  "id": "chatcmpl-4s14euxkw1hqus2n31sk6s",
  "object": "chat.completion",
  "created": 1756477751,
  "model": "qwen3-0.6b-mlx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>\nOkay, let's see. The user provided a text document and wants me to extract entities of specified types from it. The entity_types are [character, organization, location, time period, object, theme, event]. \n\nFirst, I need to scan through the given text. The text starts with an HTML table that lists some links and metadata about an eBook. But looking at the actual content, I don't see any text related to entities like person names, organizations, locations, etc. \n\nWait, maybe there's a typo in the user's input? The provided text seems to be just metadata about an eBook, not a document with content. So there are no entities in the text to process according to the instructions. \n\nBut perhaps I should check again. The user's example outputs have entities like \"Alex\" and \"Taylor\". However, in the given text",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 4070,
    "completion_tokens": 178,
    "total_tokens": 4248
  },
  "stats": {},
  "system_fingerprint": "qwen3-0.6b-mlx"
}
2025-08-29 10:35:17  [INFO]
 [LM STUDIO SERVER] Running chat completion on conversation with 1 messages.
2025-08-29 10:35:17  [INFO]
 [LM STUDIO SERVER] Running chat completion on conversation with 1 messages.
2025-08-29 10:35:17  [INFO]
 [LM STUDIO SERVER] Running chat completion on conversation with 1 messages.
2025-08-29 10:35:17  [INFO]
 [LM STUDIO SERVER] Running chat completion on conversation with 1 messages.
2025-08-29 10:35:17  [INFO]
 [LM STUDIO SERVER] Running chat completion on conversation with 1 messages.
2025-08-29 10:35:18  [INFO]
 [LM STUDIO SERVER] Client disconnected. Stopping generation... (If the model is busy processing the prompt, it will finish first.)
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Model generated tool calls:  []
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Generated prediction:  {
  "id": "chatcmpl-k66f5h4i4mioiwcqbnttqk",
  "object": "chat.completion",
  "created": 1756478117,
  "model": "qwen3-0.6b-mlx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {},
  "stats": {},
  "system_fingerprint": "qwen3-0.6b-mlx"
}
2025-08-29 10:35:18  [INFO]
 [LM STUDIO SERVER] Client disconnected. Stopping generation... (If the model is busy processing the prompt, it will finish first.)
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Model generated tool calls:  []
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Generated prediction:  {
  "id": "chatcmpl-l7fb4w4itci3wialebm3",
  "object": "chat.completion",
  "created": 1756478117,
  "model": "qwen3-0.6b-mlx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {},
  "stats": {},
  "system_fingerprint": "qwen3-0.6b-mlx"
}
2025-08-29 10:35:18  [INFO]
 [LM STUDIO SERVER] Client disconnected. Stopping generation... (If the model is busy processing the prompt, it will finish first.)
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Model generated tool calls:  []
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Generated prediction:  {
  "id": "chatcmpl-dn6xn1rx195w2pwpxsks1a",
  "object": "chat.completion",
  "created": 1756478117,
  "model": "qwen3-0.6b-mlx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {},
  "stats": {},
  "system_fingerprint": "qwen3-0.6b-mlx"
}
2025-08-29 10:35:18  [INFO]
 [LM STUDIO SERVER] Client disconnected. Stopping generation... (If the model is busy processing the prompt, it will finish first.)
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Model generated tool calls:  []
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Generated prediction:  {
  "id": "chatcmpl-3vah0ksxtsxkznu6m7uzr",
  "object": "chat.completion",
  "created": 1756478117,
  "model": "qwen3-0.6b-mlx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {},
  "stats": {},
  "system_fingerprint": "qwen3-0.6b-mlx"
}
2025-08-29 10:35:18  [INFO]
 [LM STUDIO SERVER] Client disconnected. Stopping generation... (If the model is busy processing the prompt, it will finish first.)
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Model generated tool calls:  []
2025-08-29 10:35:18  [INFO]
 [qwen3-0.6b-mlx] Generated prediction:  {
  "id": "chatcmpl-2itr1egtpdezidzlj0et68",
  "object": "chat.completion",
  "created": 1756478117,
  "model": "qwen3-0.6b-mlx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>\nOkay, let's tackle this problem. The user provided a text document and a list of entity types",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 4083,
    "completion_tokens": 23,
    "total_tokens": 4106
  },
  "stats": {},
  "system_fingerprint": "qwen3-0.6b-mlx"
}

@hungrymonkey
Copy link
Author

The program is thinking

sashabaranov/go-openai#980

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-0.6b-mlx",
    "messages": [
      { "role": "system", "content": "Always answer in rhymes. Today is Thursday" },
      { "role": "user", "content": "What day is it today?" }
    ],
    "temperature": 0.7,
    "max_tokens": -1,
    "stream": false
}'
{
  "id": "chatcmpl-x4nfc347sskej459b0xe6s",
  "object": "chat.completion",
  "created": 1756479156,
  "model": "qwen3-0.6b-mlx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>\nOkay, the user asked \"Today is Thursday\" and then continued with a question about what day it's today. But the system says to always answer in rhymes. Let me check again.\n\nWait, maybe there's a misunderstanding here. The original query was \"Today is Thursday,\" and the user then asked, \"What day is it today?\" So they made a mistake in their question. But I need to respond in rhymes. Let me try that.\n\nToday is Thursday, the day we're all at home. We come in and go out for a while. Maybe add something like \"The day is bright, we're so happy.\" That's rhyming. Let me make sure it fits together. Yeah, that works.\n</think>\n\nToday is Thursday! The day we’re all at home,  \nWe come in and go out with our friends.  \nThe sky’s blue and the sun is up,  \nAnd we’re all laughing in our days.",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 198,
    "total_tokens": 226
  },
  "stats": {},
  "system_fingerprint": "qwen3-0.6b-mlx"
}%        

I cannot turn off think

@hungrymonkey
Copy link
Author

curl http://localhost:1234/v1/chat/completions
-H "Content-Type: application/json"
-d '{
"model": "qwen3-0.6b-mlx",
"messages": [
{ "role": "system", "content": "Always answer in rhymes. Today is Thursday" },
{ "role": "user", "content": "What day is it today?" }
],
"temperature": 0.7,
"max_tokens": -1,
"stream": false,
"enable_think": false
}'
{
"id": "chatcmpl-rg98cl9p6ypbolnz1j94md",
"object": "chat.completion",
"created": 1756479248,
"model": "qwen3-0.6b-mlx",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "\nOkay, the user asked, "What day is it today?" and I need to respond in rhymes. Let me start by thinking about the current date. Since today is Thursday, I need to make a rhyme with that.\n\nThursday rhymes with... Thursday? Maybe not. Let me think of another word. What about "Tuesday" and "Wednesday"? But the user asked for today's date, so maybe focus on Thursday. \n\nI should make a four-syllable rhyme pair. Let me try: "Thursday, Thursday, the day we're here." Then maybe add a second line that rhymes with "Thursday." \n\n"Today's the day we're here, and it's blue as a sky." Wait, "blue as a sky" rhymes with "Thursday." That works. Let me check the syllables: Thursday (4), Thursday (4), today's... (3). That might work. \n\nAlternatively, could I make it more concise? Like "Thursday, Thursday, today's the day we're here." Then the second line could be "And it's blue as a sky." That seems good. I think that's the rhyming couplet.\n\n\n"Thursday, Thursday, today's the day we're here. \nAnd it's blue as a sky, and bright like a ray."",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 274,
"total_tokens": 302
},
"stats": {},
"system_fingerprint": "qwen3-0.6b-mlx"
}%

@hungrymonkey
Copy link
Author

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-0.6b-mlx",
    "messages": [
      { "role": "system", "content": "Always answer in rhymes. Today is Thursday" },
      { "role": "user", "content": "What day is it today?" }
    ],
    "temperature": 0.7,
    "max_tokens": -1,
    "stream": false,
  "chat_template_kwargs": {"enable_thinking": false}
}'
{
  "id": "chatcmpl-1cen9zxnvi285o7576lk4v",
  "object": "chat.completion",
  "created": 1756479339,
  "model": "qwen3-0.6b-mlx",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>\nOkay, the user asked, \"Today is Thursday.\" But the system responded with \"What day is it today?\" which is a follow-up. I need to adjust my response to make sure it's both helpful and in rhymes.\n\nFirst, the user is confirming today's date. The original answer was about Thursday. So I should respond directly to their confirmation, maybe using a rhyme with \"Thursday\" and something related. Let me think of rhyming words. Maybe start with \"Today is Thursday!\" and then add a second line that's in rhymes.\n\nPossible lines: \"Today is Thursday, the day of the week!\" and \"And so I'll make it a happy day!\" That works. Let me check the rhyme scheme. \"Thursday\" and \"week\" rhyme? Not exactly, but maybe adjust. Alternatively, \"Today is Thursday!\" and \"And so the day's done!\" That could work too. Both lines rhyme, making it easy to follow.\n</think>\n\nToday is Thursday!  \nAnd so I'll make it a happy day!",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 217,
    "total_tokens": 245
  },
  "stats": {},
  "system_fingerprint": "qwen3-0.6b-mlx"
}

@hungrymonkey
Copy link
Author

Turning off think requires a library upgrade

hungrymonkey@a71f49b

hungrymonkey pushed a commit to hungrymonkey/go-light-rag that referenced this pull request Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant