You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -59,8 +59,9 @@ From there you can also check the documentaiton.
59
59
The plugin does not include a large language model (LLM). You need to provide an LLM in the GGUF file format.
60
60
61
61
A good place to start is something like [Qwen3 4B](https://huggingface.co/Qwen/Qwen3-4B-GGUF/blob/main/Qwen3-4B-Q4_K_M.gguf).
62
-
If you need something faster, try with a smaller model (e.g. Qwen3 0.6B). If you need soemthing smarter, try with a larger model (e.g. Qwen3 14B).
62
+
If you need something faster, try with a smaller model (e.g. Qwen3 0.6B). If you need something smarter, try with a larger model (e.g. Qwen3 14B).
63
63
If you need something smarter *and* faster, wait a few months.
64
+
Have a look at our [model selection guide](https://nobodywho-ooo.github.io/nobodywho/model-selection/) for more in-depth recommendations.
64
65
65
66
Once you have a GGUF model file, you can add a `NobodyWhoModel` node to your Godot scene. On this node, set the model file to the GGUF model you just downloaded.
66
67
@@ -145,13 +146,12 @@ We're looking into solutions for including this file automatically.
145
146
146
147
New language models are coming out at a breakneck pace. If you search the web for "best language models for roleplay" or something similar, you'll probably find results that are several months or years old. You want to use something newer.
147
148
148
-
We recommend checking leaderboards like [The GPU-Poor LLM Gladiator Arena](https://huggingface.co/spaces/k-mktr/gpu-poor-llm-arena), or [OpenRouter's Roleplay Rankings](https://openrouter.ai/rankings/roleplay).
149
-
Once you select a model, remember that you need a quantization of it in the GGUF format.
150
-
The huggingface user [bartowski](https://huggingface.co/bartowski) regularly uploads GGUF quantizations for *a lot* of new models.
151
-
152
-
Selecting the best model for your usecase is mostly about finding the right tradeoff between speed, memory usage and quality of the responses.
149
+
Selecting the best model for your use-case is mostly about finding the right trade-off between speed, memory usage and quality of the responses.
153
150
Using bigger models will yield better responses, but raise minimum system requirements and slow down generation speed.
154
151
152
+
Have a look at our [model selection guide](https://nobodywho-ooo.github.io/nobodywho/model-selection/) for more in-depth recommendations.
153
+
154
+
155
155
### NobodyWho makes Godot crash on Arch Linux / Manjaro
156
156
157
157
The Godot build currently in the Arch linux repositories does not work with gdextensions at all.
Copy file name to clipboardExpand all lines: docs/chat/structured-output.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -338,7 +338,7 @@ You should additionally provide the right context and single or few shots prompt
338
338
339
339
### Underscores footgun
340
340
341
-
The GBNF format does not support `_`. According the [the GBNF format documentation](https://github.com/ggml-org/llama.cpp/tree/master/grammars#json-schemas--gbnf), only lowercase characters and dashes are allowed for naming nonterminals.
341
+
The GBNF format does not support `_`. According to the [the GBNF format documentation](https://github.com/ggml-org/llama.cpp/tree/master/grammars#json-schemas--gbnf), only lowercase characters and dashes are allowed for naming nonterminals.
Beware not to add to many symbols in you backstory. If the model can not write a `.` it will increase the chance that it will end the sentence instead of writing paragraph upon paragraph of text.
489
+
Beware not to add too many symbols in you backstory. If the model can not write a `.` it will increase the chance that it will end the sentence instead of writing paragraph upon paragraph of text.
Copy file name to clipboardExpand all lines: docs/rag.md
+3-4Lines changed: 3 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,12 +9,12 @@ Great! You've got chat and embeddings working. Now lets add something useful: th
9
9
10
10
Picture this: Your player is 40 hours into your RPG and asks an npc "Where do I find that crystal for the sword upgrade?"
11
11
Your LLM, without reranking, might give a generic answer or worse - make something up - leading to a bad player experience.
12
-
There are several ways to combat this, one is to load a lot of information into the context (ie. the system prompt) but with a limited context, it might 'forget' the important information
12
+
There are several ways to combat this, one is to load a lot of information into the context (i.e. the system prompt) but with a limited context, it might 'forget' the important information
13
13
or be confused by too much information. Instead we want to add a "long term memory" module to our language model.
14
14
15
15
To do this in the llm space you are going to use RAG (retreival augmented generation) we are enriching the knowledge of the LLM by allowing it to search through a database of info we fed it.
16
16
There are many ways to do this. In Nobodywho we currently expose two major ways, one is embeddings; converting a sentence to a vector and then find the vectors that are closest to it.
17
-
This is powerful as you can save the vectors to a database or a file beforehand and then use the really fast and cheap cosine similarity to compare them. Another more expensive but more accurate way is to use a cross-encoder, that figures out the relationship between the question and the document rather that just how similar they are.
17
+
This is powerful as you can save the vectors to a database or a file beforehand and then use the really fast and cheap cosine similarity to compare them. Another more expensive but more accurate way is to use a cross-encoder that figures out the relationship between the question and the document rather that just how similar they are.
18
18
19
19
This approach is often called reranking, due to how it is used as a step two, for sorting and filtering large knowledge databases accesed by LLMs. I'll call it ranking as we are working with a small enough dataset that we do not need a first pass to filter out irrelevant info.
20
20
@@ -53,8 +53,7 @@ Reranking models are different from chat and embedding models. You need one spec
53
53
54
54
We recommend [bge-reranker-v2-m3-Q8_0.gguf](https://huggingface.co/gpustack/bge-reranker-v2-m3-GGUF/resolve/main/bge-reranker-v2-m3-Q8_0.gguf) - it works well for most games and supports multiple languages.
55
55
56
-
57
-
Note that the current qwen3 reranker does not work, this is due to how the created the template as it has some missing fields.
56
+
Note that the current qwen3 reranker does not work, due to how they created the template as it has some missing fields.
58
57
59
58
## Practical Example: Smart NPC with Knowledge Base
0 commit comments