EPYC 9654 复现成功kt0.2.3post2,并汇报下成绩 #979
CYSTEV-chn
started this conversation in
Show and tell
Replies: 2 comments 3 replies
-
你这个速度不错,基本可以愉快的使用了。 可以试一下Unsloth Q2_K_XL的速度吗?我在决定是不是要把内存扩充到可以运行Q4 https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-UD-Q2_K_XL |
Beta Was this translation helpful? Give feedback.
2 replies
-
prefill是否有点慢了,更长的prompt也是这么慢吗 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
硬件:
CPU:epyc96541
主板:技嘉MZ33-AR0
内存:三星DDR5 64G12
显卡:4090 24G涡轮
硬盘:三星M.2 4T
机箱:纸皮ATX机箱
电源:长城金牌2200W
垃圾主板内存位档显卡,又加装一块PCIE16的延长线
BIOS:NSP=1,SMT关,AVX512开,核心数auto
OS1:win server2022
OS2:ubuntu 24.04
驱动:nvidia 570
cuda toolkit:12.4.1
python:3.12
GIT
openai库
flashinfer
AI model:从抱脸上拖下来的 unsloth deepseek-671B-Q4分层gguf格式文件
启动命令:
export HF_ENDPOINT="https://hf-mirror.com"
python -m ktransformers.local_chat
--model_path deepseek-ai/DeepSeek-R1
--gguf_path /home/dministrator/models/DeepSeek-R1-Q4_K_M
--max_new_tokens 4096
--total_context 102800
--cpu_infer 84
--cache_q4 true
--temperature 0.6
--top_p 0.95
测试环境:
OS1:wsl2 ubuntu
ktransformers0.2.3post2-fancy
OS2:ubuntu
ktransformers0.2.3post2-fancy
local chat mode:
提示词:请说一段50字以内的笑话。
OS1:eval 9tps
OS2:eval 14.02tps
结论:原生ubuntu下,生成的tps速度大约比win或者win wsl2、docker、anaconda、msvc等虚拟环境下要快上40-55%

感谢KT团队,感谢ubuntu一键脚本安装KT作者maaaxinfinity
Beta Was this translation helpful? Give feedback.
All reactions