The model has crashed when continuously requested #157

ssbg2 · 2025-05-08T04:00:17Z

Hello, while using LM Studio, I observed that my currently deployed models (Qwen2.5-VL-32B-Instruct, Qwen3-235B-A22B, and DeepSeek R1, all in MLX format*) consistently crash during sustained inference calls. My LM Studio version is 0.3.15 (Build 11). Below is the error log from one of the failures:
2025-05-07 18:19:54 [DEBUG]
Received request: POST to /v1/chat/completions with body {
"messages": [
{
"role": "user",
"content": "智能助手名称：新闻分析专家\n主要任务：请根据提供的新闻内容分析新闻的重要性,并对新闻内容进行权重... ... "effect": <新闻对涉及行业或公司构成“重大利空、"重大利好"、"利空"、"利好">\n}"
}
],
"model": "glm-4-32b-0414-abliterated",
"n": 1,
"stream": true,
"temperature": 0.7
}
2025-05-07 18:19:54 [INFO]
[LM STUDIO SERVER] Running chat completion on conversation with 1 messages.
2025-05-07 18:19:54 [INFO]
[LM STUDIO SERVER] Streaming response...
2025-05-07 18:19:54 [DEBUG]
[CacheWrapper][INFO] Trimmed 188 tokens from the prompt cache
2025-05-07 18:19:56 [INFO]
[LM STUDIO SERVER] First token generated. Continuing to stream response..
2025-05-08 06:13:34 [DEBUG]
Fatal Python error: Aborted

Current thread 0x00000072c2367000 (most recent call first):
File "/Users/tldev/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac-arm64@37/lib/python3.11/site-packages/mlx/nn/layers/positional_encoding.py", line 47 in call
File "/Users/tldev/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac-arm64@37/lib/python3.11/site-packages/mlx_lm/models/glm4.py", line 92 in call
File "/Users/tldev/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac-arm64@37/lib/python3.11/site-packages/mlx_lm/models/glm4.py", line 125 in call
File "/Users/tldev/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac-arm64@37/lib/python3.11/site-packages/mlx_lm/models/glm4.py", line 159 in call
File "/Users/tldev/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac-arm64@37/lib/python3.11/site-packages/mlx_lm/models/glm4.py", line 178 in call
File "/Users/tldev/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac-arm64@37/lib/python3.11/site-packages/mlx_lm/generate.py", line 357 in _step
File "/Users/tldev/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac-arm64@37/lib/python3.11/site-packages/mlx_lm/generate.py"
2025-05-08 06:13:34 [DEBUG]
, line 391 in generate_step
File "/Users/tldev/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac-arm64@37/lib/python3.11/site-packages/mlx_lm/generate.py", line 625 in
File "/Users/tldev/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac-arm64@37/lib/python3.11/site-packages/mlx_lm/generate.py", line 636 in stream_generate
File "/Users/tldev/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac-arm64@37/lib/python3.11/site-packages/mlx_engine/generate.py", line 345 in create_generator

Thread 0x00000001fc290c80 (most recent call first):

Extension modules: charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, markupsafe._speedups, PIL._imaging, numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, PIL._imagingft, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, regex._regex, xxhash._xxhash (total: 31)
2025-05-08 06:13:35 [ERROR]
The model has crashed without additional information. (Exit code: 6). Error Data: n/a, Additional Data: n/a

YorkieDev · 2025-05-08T08:10:25Z

cc @neilmehta24

@ssbg2 can you run lms log stream in your terminal and paste the output here when the model crashes?

ssbg2 · 2025-05-09T00:53:44Z

cc @neilmehta24

@ssbg2 can you run lms log stream in your terminal and paste the output here when the model crashes?

Thank you for your reply, but the command "lms log stream" does not give any more error messages:
timestamp: 2025/5/8 21:37:26
type: llm.prediction.input
modelIdentifier: qwen3-235b-a22b
modelPath: mlx-community/Qwen3-235B-A22B-8bit
input: "智能助手名称：新闻分析专家
主要任务：请根据提供的新闻内容分析新闻的重要性,并对新闻内容进行权重打分？
文本分析：能够准确分析新闻文本的深层含义与意义解读。
权重打分：根据分析结果，将新闻事件对直接相关领域（如企业、行业、社区、政策等）的即时性、具体影响，并对影响进行评估打分。
当前的任务执行记录:
输入：【长虹美菱：拟1.5亿元-3亿元回购股份】长虹美菱(000521.SZ)公告称，公司计划使用自有资金以集中竞价交易方式回购部分A股股份，用于股权激励。回购金额不低于1.5亿元且不超过3亿元，回购价格上限为11元/股。回购期限自董事会审议通过之日起不超过12个月。
输出：以 JSON 的形式输出，输出的 JSON 需遵守以下的格式：
{
"sense": <新闻的深层含义>,
"name": <新闻涉及的行业名称或公司名称>,
"weight": <新闻重要性的权重分值,分值范围为0-10分，整数，重要性高分值大>,
"effect": <新闻对涉及行业或公司构成“重大利空、"重大利好"、"利空"、"利好">
}"

However, I have noticed that the model usually crashes with the following prompt:
Developer logs:

[CacheWrapper][WARN] Tried to trim '154987' tokens from the prompt cache, but could not: Cache is not trimmable. Clearing the cache instead.

But there are times when it does and it doesn't crash, so I'm confused too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The model has crashed when continuously requested #157

The model has crashed when continuously requested #157

ssbg2 commented May 8, 2025

YorkieDev commented May 8, 2025

Uh oh!

ssbg2 commented May 9, 2025

Uh oh!

The model has crashed when continuously requested #157

The model has crashed when continuously requested #157

Comments

ssbg2 commented May 8, 2025

YorkieDev commented May 8, 2025

Uh oh!

ssbg2 commented May 9, 2025

Uh oh!