Skip to content

Commit 59f9b31

Browse files
authored
feat: support disable thinking when VLLM model (#616)
* feat: add deep think switch button Signed-off-by: Bob Du <i@bobdu.cc> * feat: api model support VLLM Signed-off-by: Bob Du <i@bobdu.cc> * feat: api support disable thinking when VLLM model Signed-off-by: Bob Du <i@bobdu.cc> * docs: update readme Signed-off-by: Bob Du <i@bobdu.cc> --------- Signed-off-by: Bob Du <i@bobdu.cc>
1 parent d43d724 commit 59f9b31

File tree

22 files changed

+290
-74
lines changed

22 files changed

+290
-74
lines changed

README.en.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ Some unique features have been added:
3434

3535
[] Web Search functionality (Real-time web search based on Tavily API)
3636

37+
[] VLLM API model support & Optional disable deep thinking mode
38+
3739
> [!CAUTION]
3840
> This project is only published on GitHub, based on the MIT license, free and for open source learning usage. And there will be no any form of account selling, paid service, discussion group, discussion group and other behaviors. Beware of being deceived.
3941
@@ -125,6 +127,10 @@ For all parameter variables, check [here](#docker-parameter-example) or see:
125127

126128
[] Interface themes
127129

130+
[] VLLM API model support
131+
132+
[] Deep thinking mode switch
133+
128134
[] More...
129135

130136
## Prerequisites
@@ -318,6 +324,63 @@ PS: You can also run `pnpm start` directly on the server without packaging.
318324
pnpm build
319325
```
320326

327+
## VLLM API Deep Thinking Mode Control
328+
329+
> [!TIP]
330+
> Deep thinking mode control is only available when the backend is configured to use VLLM API, allowing users to choose whether to enable the model's deep thinking functionality.
331+
332+
### Features
333+
334+
- **VLLM API Exclusive Feature**: Only available when the backend uses VLLM API
335+
- **Per-conversation Control**: Each conversation can independently enable or disable deep thinking mode
336+
- **Real-time Switching**: Deep thinking mode can be switched at any time during conversation
337+
- **Performance Optimization**: Disabling deep thinking can improve response speed and reduce computational costs
338+
339+
### Prerequisites
340+
341+
**The following conditions must be met to use this feature:**
342+
343+
1. **Backend Configuration**: Backend must be configured to use VLLM API interface
344+
2. **Model Support**: The model used must support deep thinking functionality
345+
3. **API Compatibility**: VLLM API version needs to support thinking mode control parameters
346+
347+
### Usage
348+
349+
#### 1. Enable/Disable Deep Thinking Mode
350+
351+
1. **Enter Conversation Interface**: In a conversation session that supports VLLM API
352+
2. **Find Control Switch**: Locate the "Deep Thinking" toggle button in the conversation interface
353+
3. **Switch Mode**:
354+
- Enable: Model will perform deep thinking, providing more detailed and in-depth responses
355+
- Disable: Model will respond directly, faster but potentially more concise
356+
357+
#### 2. Usage Scenarios
358+
359+
**Recommended to enable deep thinking when:**
360+
- Complex problems require in-depth analysis
361+
- Logical reasoning and multi-step thinking are needed
362+
- High-quality responses are required
363+
- Time is not sensitive
364+
365+
**Recommended to disable deep thinking when:**
366+
- Simple questions need quick answers
367+
- Fast response is required
368+
- Need to reduce computational costs
369+
- Batch processing simple tasks
370+
371+
#### 3. Technical Implementation
372+
373+
- **API Parameter**: Controlled through VLLM API's `disable_thinking` parameter
374+
- **State Persistence**: Each conversation session independently saves the deep thinking switch state
375+
- **Real-time Effect**: Takes effect immediately for the next message after switching
376+
377+
### Notes
378+
379+
- **VLLM API Only**: This feature is only available when the backend uses VLLM API, other APIs (such as OpenAI API) do not support this feature
380+
- **Model Dependency**: Not all models support deep thinking mode, please confirm that your model supports this feature
381+
- **Response Differences**: Disabling deep thinking may affect the detail and quality of responses
382+
- **Cost Considerations**: Enabling deep thinking typically increases computational costs and response time
383+
321384
## Frequently Asked Questions
322385

323386
Q: Why does Git always report an error when committing?

README.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@
3434

3535
[] Web Search 网络搜索功能 (基于 Tavily API 实现实时网络搜索)
3636

37+
[] VLLM API 模型支持 & 可选关闭深度思考模式
38+
3739

3840
> [!CAUTION]
3941
> 声明:此项目只发布于 Github,基于 MIT 协议,免费且作为开源学习使用。并且不会有任何形式的卖号、付费服务、讨论群、讨论组等行为。谨防受骗。
@@ -128,6 +130,10 @@
128130

129131
[] 界面主题
130132

133+
[] VLLM API 模型支持
134+
135+
[] 深度思考模式开关
136+
131137
[] More...
132138

133139
## 前置要求
@@ -454,6 +460,63 @@ Current time: {current_time}
454460
- 每个会话可以独立控制是否使用搜索功能
455461

456462

463+
## VLLM API 深度思考模式控制
464+
465+
> [!TIP]
466+
> 深度思考模式控制功能仅在后端配置为 VLLM API 时可用,可以让用户选择是否启用模型的深度思考功能。
467+
468+
### 功能特性
469+
470+
- **VLLM API 专属功能**: 仅在后端使用 VLLM API 时可用
471+
- **按对话控制**: 每个对话可以独立开启或关闭深度思考模式
472+
- **实时切换**: 在对话过程中可以随时切换深度思考模式
473+
- **性能优化**: 关闭深度思考可以提高响应速度,降低计算成本
474+
475+
### 使用前提
476+
477+
**必须满足以下条件才能使用此功能:**
478+
479+
1. **后端配置**: 后端必须配置为使用 VLLM API 接口
480+
2. **模型支持**: 使用的模型必须支持深度思考功能
481+
3. **API 兼容**: VLLM API 版本需要支持思考模式控制参数
482+
483+
### 使用方式
484+
485+
#### 1. 启用/关闭深度思考模式
486+
487+
1. **进入对话界面**: 在支持 VLLM API 的对话会话中
488+
2. **找到控制开关**: 在对话界面中找到"深度思考"开关按钮
489+
3. **切换模式**:
490+
- 开启:模型将进行深度思考,提供更详细和深入的回答
491+
- 关闭:模型将直接回答,响应更快但可能较为简洁
492+
493+
#### 2. 使用场景
494+
495+
**建议开启深度思考的情况:**
496+
- 复杂问题需要深入分析
497+
- 需要逻辑推理和多步骤思考
498+
- 对回答质量要求较高的场景
499+
- 时间不敏感的情况
500+
501+
**建议关闭深度思考的情况:**
502+
- 简单问题快速回答
503+
- 需要快速响应的场景
504+
- 降低计算成本的需求
505+
- 批量处理简单任务
506+
507+
#### 3. 技术实现
508+
509+
- **API 参数**: 通过 VLLM API 的 `disable_thinking` 参数控制
510+
- **状态保存**: 每个对话会话独立保存深度思考开关状态
511+
- **实时生效**: 切换后立即对下一条消息生效
512+
513+
### 注意事项
514+
515+
- **仅限 VLLM API**: 此功能仅在后端使用 VLLM API 时可用,其他 API(如 OpenAI API)不支持此功能
516+
- **模型依赖**: 不是所有模型都支持深度思考模式,请确认您使用的模型支持此功能
517+
- **响应差异**: 关闭深度思考可能会影响回答的详细程度和质量
518+
- **成本考虑**: 开启深度思考通常会增加计算成本和响应时间
519+
457520
## 常见问题
458521
Q: 为什么 `Git` 提交总是报错?
459522

service/src/chatgpt/index.ts

Lines changed: 27 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
import type { AuditConfig, KeyConfig, UserInfo } from '../storage/model'
2-
import type { ModelConfig } from '../types'
1+
import type { AuditConfig, Config, KeyConfig, UserInfo } from '../storage/model'
32
import type { TextAuditService } from '../utils/textAudit'
43
import type { ChatMessage, RequestOptions } from './types'
54
import { tavily } from '@tavily/core'
@@ -102,10 +101,18 @@ async function chatReplyProcess(options: RequestOptions) {
102101
const searchConfig = globalConfig.searchConfig
103102
if (searchConfig.enabled && searchConfig?.options?.apiKey && searchEnabled) {
104103
messages[0].content = renderSystemMessage(searchConfig.systemMessageGetSearchQuery, dayjs().format('YYYY-MM-DD HH:mm:ss'))
105-
const completion = await openai.chat.completions.create({
104+
105+
const getSearchQueryChatCompletionCreateBody: OpenAI.ChatCompletionCreateParamsNonStreaming = {
106106
model,
107107
messages,
108-
})
108+
}
109+
if (key.keyModel === 'VLLM') {
110+
// @ts-expect-error vLLM supports a set of parameters that are not part of the OpenAI API.
111+
getSearchQueryChatCompletionCreateBody.chat_template_kwargs = {
112+
enable_thinking: false,
113+
}
114+
}
115+
const completion = await openai.chat.completions.create(getSearchQueryChatCompletionCreateBody)
109116
let searchQuery: string = completion.choices[0].message.content
110117
const match = searchQuery.match(/<search_query>([\s\S]*)<\/search_query>/i)
111118
if (match)
@@ -144,7 +151,7 @@ search result: <search_result>${searchResult}</search_result>`,
144151
messages[0].content = systemMessage
145152

146153
// Create the chat completion with streaming
147-
const stream = await openai.chat.completions.create({
154+
const chatCompletionCreateBody: OpenAI.ChatCompletionCreateParamsStreaming = {
148155
model,
149156
messages,
150157
temperature: temperature ?? undefined,
@@ -153,9 +160,19 @@ search result: <search_result>${searchResult}</search_result>`,
153160
stream_options: {
154161
include_usage: true,
155162
},
156-
}, {
157-
signal: abort.signal,
158-
})
163+
}
164+
if (key.keyModel === 'VLLM') {
165+
// @ts-expect-error vLLM supports a set of parameters that are not part of the OpenAI API.
166+
chatCompletionCreateBody.chat_template_kwargs = {
167+
enable_thinking: options.room.thinkEnabled,
168+
}
169+
}
170+
const stream = await openai.chat.completions.create(
171+
chatCompletionCreateBody,
172+
{
173+
signal: abort.signal,
174+
},
175+
)
159176

160177
// Process the stream
161178
let responseReasoning = ''
@@ -253,8 +270,8 @@ async function containsSensitiveWords(audit: AuditConfig, text: string): Promise
253270
}
254271

255272
async function chatConfig() {
256-
const config = await getOriginConfig() as ModelConfig
257-
return sendResponse<ModelConfig>({
273+
const config = await getOriginConfig()
274+
return sendResponse<Config>({
258275
type: 'Success',
259276
data: config,
260277
})

service/src/index.ts

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,6 @@ router.post('/session', async (req, res) => {
146146
const hasAuth = config.siteConfig.loginEnabled || config.siteConfig.authProxyEnabled
147147
const authProxyEnabled = config.siteConfig.authProxyEnabled
148148
const allowRegister = config.siteConfig.registerEnabled
149-
config.apiModel = 'ChatGPTAPI'
150149
const userId = await getUserId(req)
151150
const chatModels: {
152151
label: string
@@ -173,7 +172,6 @@ router.post('/session', async (req, res) => {
173172
data: {
174173
auth: hasAuth,
175174
allowRegister,
176-
model: config.apiModel,
177175
title: config.siteConfig.siteTitle,
178176
chatModels,
179177
allChatModels: chatModelOptions,
@@ -227,7 +225,6 @@ router.post('/session', async (req, res) => {
227225
auth: hasAuth,
228226
authProxyEnabled,
229227
allowRegister,
230-
model: config.apiModel,
231228
title: config.siteConfig.siteTitle,
232229
chatModels,
233230
allChatModels: chatModelOptions,
@@ -246,7 +243,6 @@ router.post('/session', async (req, res) => {
246243
auth: hasAuth,
247244
authProxyEnabled,
248245
allowRegister,
249-
model: config.apiModel,
250246
title: config.siteConfig.siteTitle,
251247
chatModels: chatModelOptions,
252248
allChatModels: chatModelOptions,
@@ -659,11 +655,10 @@ router.post('/verifyadmin', authLimiter, async (req, res) => {
659655

660656
router.post('/setting-base', rootAuth, async (req, res) => {
661657
try {
662-
const { apiKey, apiModel, apiBaseUrl, accessToken, timeoutMs, reverseProxy, socksProxy, socksAuth, httpsProxy } = req.body as Config
658+
const { apiKey, apiBaseUrl, accessToken, timeoutMs, reverseProxy, socksProxy, socksAuth, httpsProxy } = req.body as Config
663659

664660
const thisConfig = await getOriginConfig()
665661
thisConfig.apiKey = apiKey
666-
thisConfig.apiModel = apiModel
667662
thisConfig.apiBaseUrl = apiBaseUrl
668663
thisConfig.accessToken = accessToken
669664
thisConfig.reverseProxy = reverseProxy

service/src/routes/room.ts

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ import {
1010
updateRoomChatModel,
1111
updateRoomPrompt,
1212
updateRoomSearchEnabled,
13+
updateRoomThinkEnabled,
1314
updateRoomUsingContext,
1415
} from '../storage/mongo'
1516

@@ -29,6 +30,7 @@ router.get('/chatrooms', auth, async (req, res) => {
2930
usingContext: r.usingContext === undefined ? true : r.usingContext,
3031
chatModel: r.chatModel,
3132
searchEnabled: !!r.searchEnabled,
33+
thinkEnabled: !!r.thinkEnabled,
3234
})
3335
})
3436
res.send({ status: 'Success', message: null, data: result })
@@ -153,6 +155,22 @@ router.post('/room-search-enabled', auth, async (req, res) => {
153155
}
154156
})
155157

158+
router.post('/room-think-enabled', auth, async (req, res) => {
159+
try {
160+
const userId = req.headers.userId as string
161+
const { thinkEnabled, roomId } = req.body as { thinkEnabled: boolean, roomId: number }
162+
const success = await updateRoomThinkEnabled(userId, roomId, thinkEnabled)
163+
if (success)
164+
res.send({ status: 'Success', message: 'Saved successfully', data: null })
165+
else
166+
res.send({ status: 'Fail', message: 'Saved Failed', data: null })
167+
}
168+
catch (error) {
169+
console.error(error)
170+
res.send({ status: 'Fail', message: 'Update error', data: null })
171+
}
172+
})
173+
156174
router.post('/room-context', auth, async (req, res) => {
157175
try {
158176
const userId = req.headers.userId as string

service/src/storage/config.ts

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ export async function getCacheConfig(): Promise<Config> {
2727
export async function getOriginConfig() {
2828
let config = await getConfig()
2929
if (config == null) {
30-
config = new Config(new ObjectId(), !Number.isNaN(+process.env.TIMEOUT_MS) ? +process.env.TIMEOUT_MS : 600 * 1000, process.env.OPENAI_API_KEY, process.env.OPENAI_API_DISABLE_DEBUG === 'true', process.env.OPENAI_ACCESS_TOKEN, process.env.OPENAI_API_BASE_URL, 'ChatGPTAPI', process.env.API_REVERSE_PROXY, (process.env.SOCKS_PROXY_HOST && process.env.SOCKS_PROXY_PORT)
30+
config = new Config(new ObjectId(), !Number.isNaN(+process.env.TIMEOUT_MS) ? +process.env.TIMEOUT_MS : 600 * 1000, process.env.OPENAI_API_KEY, process.env.OPENAI_API_DISABLE_DEBUG === 'true', process.env.OPENAI_ACCESS_TOKEN, process.env.OPENAI_API_BASE_URL, process.env.API_REVERSE_PROXY, (process.env.SOCKS_PROXY_HOST && process.env.SOCKS_PROXY_PORT)
3131
? (`${process.env.SOCKS_PROXY_HOST}:${process.env.SOCKS_PROXY_PORT}`)
3232
: '', (process.env.SOCKS_PROXY_USERNAME && process.env.SOCKS_PROXY_PASSWORD)
3333
? (`${process.env.SOCKS_PROXY_USERNAME}:${process.env.SOCKS_PROXY_PASSWORD}`)
@@ -149,9 +149,7 @@ export async function getApiKeys() {
149149
const result = await getKeys()
150150
const config = await getCacheConfig()
151151
if (result.keys.length <= 0) {
152-
if (config.apiModel === 'ChatGPTAPI')
153-
result.keys.push(await upsertKey(new KeyConfig(config.apiKey, 'ChatGPTAPI', [], [], '')))
154-
152+
result.keys.push(await upsertKey(new KeyConfig(config.apiKey, 'ChatGPTAPI', [], [], '')))
155153
result.total++
156154
}
157155
result.keys.forEach((key) => {

service/src/storage/model.ts

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -83,14 +83,16 @@ export class ChatRoom {
8383
status: Status = Status.Normal
8484
chatModel: string
8585
searchEnabled: boolean
86-
constructor(userId: string, title: string, roomId: number, chatModel: string, searchEnabled: boolean) {
86+
thinkEnabled: boolean
87+
constructor(userId: string, title: string, roomId: number, chatModel: string, searchEnabled: boolean, thinkEnabled: boolean) {
8788
this.userId = userId
8889
this.title = title
8990
this.prompt = undefined
9091
this.roomId = roomId
9192
this.usingContext = true
9293
this.chatModel = chatModel
9394
this.searchEnabled = searchEnabled
95+
this.thinkEnabled = thinkEnabled
9496
}
9597
}
9698

@@ -197,7 +199,6 @@ export class Config {
197199
public apiDisableDebug?: boolean,
198200
public accessToken?: string,
199201
public apiBaseUrl?: string,
200-
public apiModel?: APIMODEL,
201202
public reverseProxy?: string,
202203
public socksProxy?: string,
203204
public socksAuth?: string,
@@ -304,4 +305,4 @@ export class UserPrompt {
304305
}
305306
}
306307

307-
export type APIMODEL = 'ChatGPTAPI'
308+
export type APIMODEL = 'ChatGPTAPI' | 'VLLM'

0 commit comments

Comments
 (0)