more docs fixes

AsbjornOlling · AsbjornOlling · commit ef92d5fbfe5b · 2025-10-09T15:57:05.000+02:00
diff --git a/README.md b/README.md
@@ -59,8 +59,9 @@ From there you can also check the documentaiton.
 The plugin does not include a large language model (LLM). You need to provide an LLM in the GGUF file format.
 
 A good place to start is something like [Qwen3 4B](https://huggingface.co/Qwen/Qwen3-4B-GGUF/blob/main/Qwen3-4B-Q4_K_M.gguf).
-If you need something faster, try with a smaller model (e.g. Qwen3 0.6B). If you need soemthing smarter, try with a larger model (e.g. Qwen3 14B).
+If you need something faster, try with a smaller model (e.g. Qwen3 0.6B). If you need something smarter, try with a larger model (e.g. Qwen3 14B).
 If you need something smarter *and* faster, wait a few months.
+Have a look at our [model selection guide](https://nobodywho-ooo.github.io/nobodywho/model-selection/) for more in-depth recommendations.
 
 Once you have a GGUF model file, you can add a `NobodyWhoModel` node to your Godot scene. On this node, set the model file to the GGUF model you just downloaded.
 
@@ -145,13 +146,12 @@ We're looking into solutions for including this file automatically.
 
 New language models are coming out at a breakneck pace. If you search the web for "best language models for roleplay" or something similar, you'll probably find results that are several months or years old. You want to use something newer.
 
-We recommend checking leaderboards like [The GPU-Poor LLM Gladiator Arena](https://huggingface.co/spaces/k-mktr/gpu-poor-llm-arena), or [OpenRouter's Roleplay Rankings](https://openrouter.ai/rankings/roleplay).
-Once you select a model, remember that you need a quantization of it in the GGUF format.
-The huggingface user [bartowski](https://huggingface.co/bartowski) regularly uploads GGUF quantizations for *a lot* of new models.
-
-Selecting the best model for your usecase is mostly about finding the right tradeoff between speed, memory usage and quality of the responses.
+Selecting the best model for your use-case is mostly about finding the right trade-off between speed, memory usage and quality of the responses.
 Using bigger models will yield better responses, but raise minimum system requirements and slow down generation speed.
 
+Have a look at our [model selection guide](https://nobodywho-ooo.github.io/nobodywho/model-selection/) for more in-depth recommendations.
+
+
 ### NobodyWho makes Godot crash on Arch Linux / Manjaro
 
 The Godot build currently in the Arch linux repositories does not work with gdextensions at all.
diff --git a/docs/chat/structured-output.md b/docs/chat/structured-output.md
@@ -338,7 +338,7 @@ You should additionally provide the right context and single or few shots prompt
 
 ### Underscores footgun
 
-The GBNF format does not support `_`. According the [the GBNF format documentation](https://github.com/ggml-org/llama.cpp/tree/master/grammars#json-schemas--gbnf), only lowercase characters and dashes are allowed for naming nonterminals.
+The GBNF format does not support `_`. According to the [the GBNF format documentation](https://github.com/ggml-org/llama.cpp/tree/master/grammars#json-schemas--gbnf), only lowercase characters and dashes are allowed for naming nonterminals.
 
 ## Practical Example: Legendary Weapon Generator
 
@@ -486,7 +486,7 @@ ability-name ::= "Flame Strike" | "Frost Bite" | "Shadow Step" | "Lightning Bolt
 backstory ::= [a-zA-Z0-9 ]+ "."
 ```
 
-Beware not to add to many symbols in you backstory. If the model can not write a `.` it will increase the chance that it will end the sentence instead of writing paragraph upon paragraph of text.
+Beware not to add too many symbols in you backstory. If the model can not write a `.` it will increase the chance that it will end the sentence instead of writing paragraph upon paragraph of text.
 
 === ":simple-godotengine: Godot"
 
diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -3,8 +3,6 @@ _A minimal, end-to-end example showing how to load a model and perform a single
 
 ---
 
-Cool, the plugin is now enabled! Now let's understand how everything works together.
-
 One of the most important components of NobodyWho is the Chat node. It handles all the conversation logic between the user and the LLM.
 When you use the chat, you first pick a model and tell it what kind of answers you want.
 When you send a message, the chat remembers what you said and sends it off to get an answer. 
diff --git a/docs/rag.md b/docs/rag.md
@@ -9,12 +9,12 @@ Great! You've got chat and embeddings working. Now lets add something useful: th
 
 Picture this: Your player is 40 hours into your RPG and asks an npc "Where do I find that crystal for the sword upgrade?" 
 Your LLM, without reranking, might give a generic answer or worse - make something up - leading to a bad player experience. 
-There are several ways to combat this, one is to load a lot of information into the context (ie. the system prompt) but with a limited context, it might 'forget' the important information
+There are several ways to combat this, one is to load a lot of information into the context (i.e. the system prompt) but with a limited context, it might 'forget' the important information
 or be confused by too much information. Instead we want to add a "long term memory" module to our language model.
 
 To do this in the llm space you are going to use RAG (retreival augmented generation) we are enriching the knowledge of the LLM by allowing it to search through a database of info we fed it. 
 There are many ways to do this. In Nobodywho we currently expose two major ways, one is embeddings; converting a sentence to a vector and then find the vectors that are closest to it.
-This is powerful as you can save the vectors to a database or a file beforehand and then use the really fast and cheap cosine similarity to compare them. Another more expensive but more accurate way is to use a cross-encoder, that figures out the relationship between the question and the document rather that just how similar they are. 
+This is powerful as you can save the vectors to a database or a file beforehand and then use the really fast and cheap cosine similarity to compare them. Another more expensive but more accurate way is to use a cross-encoder that figures out the relationship between the question and the document rather that just how similar they are. 
 
 This approach is often called reranking, due to how it is used as a step two, for sorting and filtering large knowledge databases accesed by LLMs. I'll call it ranking as we are working with a small enough dataset that we do not need a first pass to filter out irrelevant info.
 
@@ -53,8 +53,7 @@ Reranking models are different from chat and embedding models. You need one spec
 
 We recommend [bge-reranker-v2-m3-Q8_0.gguf](https://huggingface.co/gpustack/bge-reranker-v2-m3-GGUF/resolve/main/bge-reranker-v2-m3-Q8_0.gguf) - it works well for most games and supports multiple languages.
 
-
-Note that the current qwen3 reranker does not work, this is due to how the created the template as it has some missing fields. 
+Note that the current qwen3 reranker does not work, due to how they created the template as it has some missing fields.
 
 ## Practical Example: Smart NPC with Knowledge Base