how are we handling the local model loading process for .gguf files in iOS #83

shimjunho · 2025-03-03T05:48:38Z

shimjunho
Mar 3, 2025

What Stanford Spezi module is your challenge related to?

Spezi

Description

Hello developer,

I hope you are doing well. I found your repository at StanfordSpezi/SpeziLLM and have been following the instructions for using SpeziLLM to load local LLM models in an iOS application.

However, I keep encountering an error:

SpeziLLMLocal: Local LLM file could not be opened, indicating that the model file doesn't exist

This happens even though the model file appears to be recognized correctly (its file size is reported). The model initially loads without error, but as soon as I attempt to generate a response, I get a “LLM file not found” error at runtime.

Could you please share how you are handling the local model loading process for .gguf files in iOS? Do you have any specific path configuration or other steps inside the MLX/SpeziLLMLocal pipeline that I should know about beyond placing the .gguf file in the Documents directory and setting up LLMLocalSchema?

Thank you in advance for any guidance. I appreciate your time!

Best regards, korean student(01H-W-H10)

Reproduction

Place a .gguf model file inside the Documents directory of the iOS application.
Set up SpeziLLMLocal to load the model.
Confirm that the model's file size is detected correctly.
Attempt to generate a response using the model.

Observe the error:

SpeziLLMLocal: Local LLM file could not be opened, indicating that the model file doesn't exist

Expected behavior

The model should load and generate responses without errors.
SpeziLLMLocal should be able to access the .gguf file correctly after detecting it.
There should be clear documentation or guidelines on how to properly configure the model path.

Additional context

I am running the application on iOS [your version] with [your device model].
The .gguf model file is placed in the Documents directory, as suggested in the documentation.
The file path is not manually modified; it follows the default SpeziLLM setup.
If there are additional permissions or configurations required, please let me know.

Code of Conduct

I agree to follow this projects's Code of Conduct and Contributing Guidelines

Answered by LeonNissen

Mar 3, 2025

Hi @shimjunho, thank you for your question!

The most recent SpeziLLM version does not support .gguf models. We have updated the module to work with MLX. Therefore, if you want to use the tinyllama-1.1b-chat-v1.0.Q4_0 model, you can either convert it yourself using the MLX convert function, or search (https://huggingface.co/mlx-community?search_models=llama&sort_models=downloads#models) for an existing model on Hugging Face (e.g. https://huggingface.co/pcuenq/tiny-llama-chat-mlx).

I would like to point out a maybe a bit more powerful model: https://huggingface.co/mlx-community/Llama-3.2-1B-Instruct-4bit
In SpeziLLM, you can choose to either set the model in the config to .llama3_2_1B_4bit …

View full answer

PSchmiedmayer · 2025-03-03T07:05:51Z

PSchmiedmayer
Mar 3, 2025
Maintainer

@shimjunho Thank you for reaching out!

Can you please follow the following steps to help us help you:

It would be great if you can edit your discussion post to follow our template & proper formatting. Seems like you pasted a message in the description & the issue is related to SpeziLLM?
Can you edit the message to provide some clearer steps to reproduce the error; e.g. which file are you using and how do you use SpeziLLM? Can you share a link to your code and ideally a minimally reproducible example?
Can you double-check that your GGUF file is compatible with MLX (https://github.com/ml-explore/mlx-examples/blob/main/llms/gguf_llm/README.md) and/or convert it accordingly?

6 replies

shimjunho Mar 3, 2025
Author

Now we uploaded code here https://github.com/01H-W-H10/LLMManager if you need more code please tell me, we will upload more( we think this code made a problem because of its role that loading a gguf file and setting the path of the file
tinyllama-1.1b-chat-v1.0.Q4_0.gguf we used this file and i checked that it is compatible

shimjunho Mar 3, 2025
Author

And if this method is wrong and we should use the model written in your LLMlocalModel, please let me know how to use it. It seems to be powered by a hub id but I don't know the exact way. I would really appreciate it if you could show me an example of how to use it correctly
thank you

PSchmiedmayer Mar 3, 2025
Maintainer

@philippzagar @LeonNissen Do you have a good first idea here?

LeonNissen Mar 3, 2025
Collaborator

Hi @shimjunho, thank you for your question!

The most recent SpeziLLM version does not support .gguf models. We have updated the module to work with MLX. Therefore, if you want to use the tinyllama-1.1b-chat-v1.0.Q4_0 model, you can either convert it yourself using the MLX convert function, or search (https://huggingface.co/mlx-community?search_models=llama&sort_models=downloads#models) for an existing model on Hugging Face (e.g. https://huggingface.co/pcuenq/tiny-llama-chat-mlx).

I would like to point out a maybe a bit more powerful model: https://huggingface.co/mlx-community/Llama-3.2-1B-Instruct-4bit
In SpeziLLM, you can choose to either set the model in the config to .llama3_2_1B_4bit or to .custom(id: mlx-community/Llama-3.2-1B-Instruct-4bit).

Answer selected by philippzagar

shimjunho Mar 3, 2025
Author

@LeonNissen thanks you so much
Can you give me some example codes how to set .llama3_2_1B_4bit in spezillm?(i mean how to write code to use this model and which file we should download)i had tried download file using hub id but i got the response that there was not any data of llm....i think i made a mistake in the past so i really wanna see how to use hub id to download that llama file here
thank you

LeonNissen Mar 3, 2025
Collaborator

You can find a working example here. This is the TestApp within the package.
The specific line you would need to change is here.
You can choose from a variety of models. Depending on your device's resources some of them work better than others. If you want to use an 8B parameter model you would need an iPhone with at least 8GB RAM (iPhone 15 Pro onwards).🚀

.llama3_1_8B_4bit
.llama3_8B_4bit
.llama3_2_1B_4bit
.llama3_2_3B_4bit
.llama3_1_aloe_8B_4bit
.llama3_med42_8B_4bit
.mistralNeMo4bit
.smolLM_135M_4bit
.mistral7B4bit
.codeLlama13b4bit
.phi4bit
.phi3_4bit
.phi3_5_4bit
.gemma2bQuantized
.gemma_2_9b_it_4bit
.gemma_2_2b_it_4bit
.qwen1_5_0_5b_4bit
.qwen2_7b_4bit
.openelm270m4bit
.deepseek_r1_distill_qwen_1_5b_8bit
.deepseek_r1_distill_qwen_7b_4bit
.deepseek_r1_distill_llama_8b_4bit

01H-W-H10 · 2025-03-03T08:58:50Z

01H-W-H10
Mar 3, 2025

Hello! I am same team with shimjunho.
I added detailed debugging at READ ME on https://github.com/01H-W-H10/LLMManager.
During response generation, the model file is reported as missing or inaccessible, despite a successful download.

0 replies

Stanford Spezi

how are we handling the local model loading process for .gguf files in iOS #83

Uh oh!

Uh oh!

shimjunho Mar 3, 2025

What Stanford Spezi module is your challenge related to?

Description

Reproduction

Expected behavior

Additional context

Code of Conduct

Replies: 2 comments · 6 replies

Uh oh!

PSchmiedmayer Mar 3, 2025 Maintainer

Uh oh!

Uh oh!

shimjunho Mar 3, 2025 Author

Uh oh!

Uh oh!

shimjunho Mar 3, 2025 Author

Uh oh!

PSchmiedmayer Mar 3, 2025 Maintainer

Uh oh!

Uh oh!

LeonNissen Mar 3, 2025 Collaborator

Uh oh!

Uh oh!

shimjunho Mar 3, 2025 Author

Uh oh!

LeonNissen Mar 3, 2025 Collaborator

Uh oh!

01H-W-H10 Mar 3, 2025

shimjunho
Mar 3, 2025

Replies: 2 comments 6 replies

PSchmiedmayer
Mar 3, 2025
Maintainer

shimjunho Mar 3, 2025
Author

shimjunho Mar 3, 2025
Author

PSchmiedmayer Mar 3, 2025
Maintainer

LeonNissen Mar 3, 2025
Collaborator

shimjunho Mar 3, 2025
Author

LeonNissen Mar 3, 2025
Collaborator

01H-W-H10
Mar 3, 2025