Skip to content

Commit b27ba9d

Browse files
authored
feat: support speech to speech by nova-sonic (#39)
1 parent 55d33cd commit b27ba9d

40 files changed

+3199
-281
lines changed

README.md

Lines changed: 74 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,21 +16,23 @@
1616
SwiftChat is a fast and responsive AI chat application developed with [React Native](https://reactnative.dev/) and
1717
powered by [Amazon Bedrock](https://aws.amazon.com/bedrock/), with compatibility extending to other model providers such
1818
as Ollama, DeepSeek, OpenAI and OpenAI Compatible. With its minimalist design philosophy and robust privacy protection,
19-
it delivers real-time streaming conversations and AI image generation capabilities across Android, iOS, and macOS
20-
platforms.
19+
it delivers real-time streaming conversations, AI image generation and voice conversation assistant capabilities
20+
across Android, iOS, and macOS platforms.
2121

2222
![](assets/promo.avif)
2323

2424
### What's New 🔥
2525

26+
- 🚀 Support Speech to Speech By Amazon Nova Sonic on Apple Platform. Check [How to Use](#amazon-nova-sonic) for
27+
more details. (From v2.3.0).
28+
- Support Request Latency and token response speed display (From v2.3.0).
29+
- Change to new bubble format UI for user question (From v2.3.0).
2630
- Support for OpenAI Compatible models. You can now
2731
use [easy-model-deployer](https://github.com/aws-samples/easy-model-deployer),
2832
OpenRouter, or any OpenAI-compatible model provider via SwiftChat. Please
2933
check [Configure OpenAI Compatible](#openai-compatible) section for more details(From v2.2.0).
30-
- Support for quick model switching (From v2.2.0).
31-
- Support regeneration of AI responses (From v2.2.0).
3234

33-
**Key Features:**
35+
### Key Features
3436

3537
- Real-time streaming chat with AI
3638
- Rich Markdown Support: Tables, Code Blocks, LaTeX and More
@@ -45,7 +47,44 @@ platforms.
4547
and [OpenAI Compatible](#openai-compatible) Models)
4648
- Fully Customizable System Prompt Assistant
4749

48-
**Supported Features For Amazon Nova series**
50+
### Amazon Nova Series Features
51+
52+
#### Amazon Nova Sonic Speech to Speech Model
53+
54+
**Usage Guide**
55+
56+
1. Amazon Nova Sonic model is supported starting from v2.3.0. If you have deployed it before, You Need to:
57+
* [Update CloudFormation](#upgrade-cloudformation) Stack
58+
* [Update API](#upgrade-api)
59+
* [Upgrade your App](#-quick-download) to v2.3.0 or later
60+
61+
If you have not Deployed your CloudFormation Stack please
62+
finish [Getting Started with Amazon Bedrock](#getting-started-with-amazon-bedrock) section.
63+
2. Switch the **Region** to `us-east-1` in the settings page and select the `Nova Sonic` under **Chat Model**.
64+
3. Return to Chat page, select a system prompt or directly click the microphone icon to start your conversation.
65+
66+
**Features for Speech to Speech**
67+
68+
1. Built-in spoken language practice for words and sentences, as well as storytelling scenarios. You can also add
69+
**Custom System Prompts** for voice chatting in different scenarios.
70+
2. Support **Barge In** by default, Also you can disable in system prompt.
71+
3. Support selecting voices in the settings page, including American/British English, and options for male and female voices.
72+
4. Support **Echo Cancellation**, You can talk directly to the device without wearing headphones.
73+
5. Support **Voice Waveform** to display volume level.
74+
75+
**General Talk**
76+
77+
https://github.com/user-attachments/assets/d3028312-c420-476c-88c2-ba870015f3c4
78+
79+
**Learn Sentences**
80+
81+
https://github.com/user-attachments/assets/ebf21b12-9c93-4d2e-a109-1d6484019838
82+
83+
**Telling Story on Mac (With barge in feature)**
84+
85+
https://github.com/user-attachments/assets/c70fc2b4-8960-4a5e-b4f8-420fcd5eafd4
86+
87+
#### Other Features
4988

5089
- Record 30-second videos directly on Android and iOS for Nova analysis
5190
- Upload large videos (1080p/4K) beyond 8MB with auto compression
@@ -57,7 +96,7 @@ platforms.
5796
#### YouTube Video
5897

5998
[<img src="./assets/youtube.avif">](https://www.youtube.com/watch?v=rey05WzfEbM)
60-
> The content in the video is an early version. For UI, architecture, and inconsistencies, please refer to the current
99+
> The content in the video is an early version. For UI, architecture, and inconsistencies, please refer to the current
61100
> documentation.
62101
63102
**Comprehensive Multimodal Analysis**: Text, Image, Document and Video
@@ -111,7 +150,7 @@ this [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examp
111150
Ensure you have access to Amazon Bedrock foundation models. SwiftChat default settings are:
112151

113152
- Region: `us-west-2`
114-
- Text Model: `Amazon Nova Pro`
153+
- Chat Model: `Amazon Nova Pro`
115154
- Image Model: `Stable Diffusion 3.5 Large`
116155

117156
If you are using the image generation feature, please make sure you have enabled access to the `Amazon Nova Lite` model.
@@ -195,7 +234,7 @@ Congratulations 🎉 Your SwiftChat App is ready to use!
195234
```bash
196235
http://localhost:11434
197236
```
198-
3. Once the correct Server URL is entered, you can select your desired Ollama models from the **Text Model** dropdown
237+
3. Once the correct Server URL is entered, you can select your desired Ollama models from the **Chat Model** dropdown
199238
list.
200239

201240
</details>
@@ -207,7 +246,7 @@ Congratulations 🎉 Your SwiftChat App is ready to use!
207246

208247
1. Go to the **Settings Page** and select the **DeepSeek** tab.
209248
2. Input your DeepSeek API Key.
210-
3. Choose DeepSeek models from the **Text Model** dropdown list. Currently, the following DeepSeek models are supported:
249+
3. Choose DeepSeek models from the **Chat Model** dropdown list. Currently, the following DeepSeek models are supported:
211250
- `DeepSeek-V3`
212251
- `DeepSeek-R1`
213252

@@ -220,9 +259,12 @@ Congratulations 🎉 Your SwiftChat App is ready to use!
220259

221260
1. Navigate to the **Settings Page** and select the **OpenAI** tab.
222261
2. Enter your OpenAI API Key.
223-
3. Select OpenAI models from the **Text Model** dropdown list. The following OpenAI models are currently supported:
262+
3. Select OpenAI models from the **Chat Model** dropdown list. The following OpenAI models are currently supported:
224263
- `GPT-4o`
225264
- `GPT-4o mini`
265+
- `GPT-4.1`
266+
- `GPT-4.1 mini`
267+
- `GPT-4.1 nano`
226268

227269
Additionally, if you have deployed the [ClickStream Server](#step-2-deploy-stack-and-get-your-api-url), you can enable
228270
the **Use Proxy** option to forward your requests.
@@ -239,7 +281,7 @@ the **Use Proxy** option to forward your requests.
239281
- `Base URL` of your model provider
240282
- `API Key` of your model provider
241283
- `Model ID` of the models you want to use (separate multiple models with commas)
242-
3. Select one of your models from the **Text Model** dropdown list.
284+
3. Select one of your models from the **Chat Model** dropdown list.
243285

244286
</details>
245287

@@ -379,6 +421,26 @@ the [release notes](https://github.com/aws-samples/swift-chat/releases) to see i
379421
- **For Lambda**: Click and open [Lambda Services](https://console.aws.amazon.com/lambda/home#/functions), find and open
380422
your Lambda which start with `SwiftChatLambda-xxx`, click the **Deploy new image** button and click Save.
381423

424+
### Upgrade CloudFormation
425+
426+
1. Click and open [CloudFormation](https://console.aws.amazon.com/cloudformation), switch to the region which you
427+
have deployed the **SwiftChatAPI** stack.
428+
2. Select the **SwiftChatAPI** Stack, click **Update stack** -> **Make a direct update**
429+
3. On the **Update stack** Page, select **Replace existing template** under the **Amazon S3 URL**, then input the
430+
following template url.
431+
432+
For App Runner
433+
```
434+
https://aws-gcr-solutions.s3.amazonaws.com/swift-chat/latest/SwiftChatAppRunner.template
435+
```
436+
For Lambda
437+
```
438+
https://aws-gcr-solutions.s3.amazonaws.com/swift-chat/latest/SwiftChatLambda.template
439+
```
440+
4. Click the **Next** button and continue click **Next** button. On the **Configure stack options** page,
441+
check `I acknowledge that AWS CloudFormation might create IAM resources.` then click **Next** and *Submit* button to
442+
update your CloudFormation Template.
443+
382444
## Security
383445

384446
See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.

README_CN.md

Lines changed: 65 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -15,18 +15,19 @@
1515

1616
SwiftChat 是一款快速响应的 AI 聊天应用,采用 [React Native](https://reactnative.dev/)
1717
开发,并依托 [Amazon Bedrock](https://aws.amazon.com/bedrock/) 提供强大支持,同时兼容 Ollama、DeepSeek、OpenAI 和 OpenAI API 兼容的其他模型供应商。
18-
凭借其极简设计理念与坚实的隐私保护措施,该应用在 Android、iOS 和 macOS 平台上实现了实时流式对话及 AI 图像生成功能
18+
凭借其极简设计理念与坚实的隐私保护措施,该应用在 Android、iOS 和 macOS 平台上实现了实时流式对话、AI 图像生成和语音对话助手功能
1919

2020
![](assets/promo.avif)
2121

2222
### 新功能 🔥
2323

24+
- 🚀 在 Apple 平台上支持 Amazon Nova Sonic 语音对话功能。查看 [使用方法](#amazon-nova-系列功能) 了解更多详情。(自 v2.3.0 起)。
25+
- 支持请求延迟和 token 响应速度显示(自 v2.3.0 起)。
26+
- 用户问题展示为新的气泡 UI 格式(自 v2.3.0 起)。
2427
- 支持 OpenAI Compatible 模型。您现在可以通过 SwiftChat 使用 [easy-model-deployer](https://github.com/aws-samples/easy-model-deployer)
2528
OpenRouter 或任何 OpenAI API 兼容的模型。更多详情请查看 [配置 OpenAI Compatible](#openai-compatible) 部分(自 v2.2.0 起)。
26-
- 支持快速切换模型(自 v2.2.0 起)。
27-
- 支持 AI 内容的重新生成(自 v2.2.0 起)。
2829

29-
**主要特点:**
30+
### 主要特点
3031

3132
- 与 AI 进行实时流式聊天
3233
- 支持丰富的 Markdown 渲染:表格、代码块、LaTeX 公式等
@@ -42,7 +43,42 @@ SwiftChat 是一款快速响应的 AI 聊天应用,采用 [React Native](https
4243
[OpenAI Compatible](#openai-compatible) 模型)
4344
- 支持完全自定义的系统提示词助手
4445

45-
**Amazon Nova 系列功能支持**
46+
### Amazon Nova 系列功能
47+
48+
#### Amazon Nova Sonic 语音对话模型
49+
50+
**使用指南**
51+
52+
1. Amazon Nova Sonic 模型从 v2.3.0 开始支持。如果您之前已经部署过,您需要:
53+
* [更新 CloudFormation](#升级-cloudformation) 堆栈
54+
* [更新 API](#升级-api)
55+
* [升级您的应用](#-快速下载) 到 v2.3.0 或更高版本
56+
57+
如果您尚未部署 CloudFormation 堆栈,请完成 [Amazon Bedrock 入门](#入门指南---使用-amazon-bedrock-上的模型) 部分。
58+
2. 在设置页面将 **区域** 切换到 `us-east-1`,并在 **Chat Model** 下选择 `Nova Sonic`
59+
3. 返回聊天页面,选择系统提示词或直接点击麦克风图标开始对话。
60+
61+
**语音对话功能**
62+
63+
1. 内置单词和句子的口语练习,以及讲故事场景。您还可以添加 **自定义系统提示词** 用于不同场景的语音聊天。
64+
2. 支持在设置页面中选择声音类型,支持美式/英式英语,以及男声和女声的选择。
65+
3. 默认支持 **插话功能**,您也可以在系统提示词中禁用。
66+
4. 支持 **回声消除**,您可以直接对着设备说话而无需佩戴耳机。
67+
5. 支持 **语音波形** 显示音量级别。
68+
69+
**日常对话**
70+
71+
https://github.com/user-attachments/assets/d3028312-c420-476c-88c2-ba870015f3c4
72+
73+
**学习句子**
74+
75+
https://github.com/user-attachments/assets/ebf21b12-9c93-4d2e-a109-1d6484019838
76+
77+
**Mac 上讲故事(打断功能展示)**
78+
79+
https://github.com/user-attachments/assets/c70fc2b4-8960-4a5e-b4f8-420fcd5eafd4
80+
81+
#### 其他功能
4682

4783
- 支持直接在安卓和 iOS 设备上录制最长 30 秒的视频供 Nova 分析
4884
- 支持自动压缩上传超过 8MB 的高清视频(1080p/4K)
@@ -176,7 +212,7 @@ SwiftChat 是一款快速响应的 AI 聊天应用,采用 [React Native](https
176212
```bash
177213
http://localhost:11434
178214
```
179-
3. 输入正确的服务器 URL 后,您可以从 **文本模型** 下拉列表中选择所需的 Ollama 模型。
215+
3. 输入正确的服务器 URL 后,您可以从 **Chat Model** 下拉列表中选择所需的 Ollama 模型。
180216

181217
</details>
182218

@@ -187,7 +223,7 @@ SwiftChat 是一款快速响应的 AI 聊天应用,采用 [React Native](https
187223

188224
1. 进入 **设置页面**,选择 **DeepSeek** 标签。
189225
2. 输入您的 DeepSeek API 密钥。
190-
3. 从 **文本模型** 下拉列表中选择 DeepSeek 模型。目前支持以下 DeepSeek 模型:
226+
3. 从 **Chat Model** 下拉列表中选择 DeepSeek 模型。目前支持以下 DeepSeek 模型:
191227
- `DeepSeek-V3`
192228
- `DeepSeek-R1`
193229

@@ -200,9 +236,12 @@ SwiftChat 是一款快速响应的 AI 聊天应用,采用 [React Native](https
200236

201237
1. 进入 **设置页面**,选择 **OpenAI** 标签。
202238
2. 输入您的 OpenAI API 密钥。
203-
3. 从 **文本模型** 下拉列表中选择 OpenAI 模型。目前支持以下 OpenAI 模型:
239+
3. 从 **Chat Model** 下拉列表中选择 OpenAI 模型。目前支持以下 OpenAI 模型:
204240
- `GPT-4o`
205241
- `GPT-4o mini`
242+
- `GPT-4.1`
243+
- `GPT-4.1 mini`
244+
- `GPT-4.1 nano`
206245

207246
此外,如果您已部署 [ClickStream Server](#第-2-步-部署堆栈并获取-api-url),可以启用 **Use Proxy** 选项以转发您的请求。
208247

@@ -218,7 +257,7 @@ SwiftChat 是一款快速响应的 AI 聊天应用,采用 [React Native](https
218257
- 模型提供商的 `Base URL`
219258
- 模型提供商的 `API Key`
220259
- 您想使用的 `Model ID`(多个模型用英文逗号分隔)
221-
3. 从 **文本模型** 下拉列表中选择您的一个模型。
260+
3. 从 **Chat Model** 下拉列表中选择您的一个模型。
222261

223262
</details>
224263

@@ -357,6 +396,23 @@ npm run ios
357396
- **对于 Lambda**:点击并打开 [Lambda Services](https://console.aws.amazon.com/lambda/home#/functions) 页面,找到并打开
358397
`SwiftChatLambda-xxx` 开头的 Lambda 函数,点击 **部署新镜像** 按钮并点击保存。
359398

399+
### 升级 CloudFormation
400+
401+
1. 点击并打开 [CloudFormation](https://console.aws.amazon.com/cloudformation),切换到您已部署 **SwiftChatAPI** 堆栈的区域。
402+
2. 选择 **SwiftChatAPI** 堆栈,点击 **更新堆栈** -> **进行直接更新**
403+
3. 在 **更新堆栈** 页面上,在 **Amazon S3 URL** 下选择 **替换现有模板**,然后输入以下模板 URL。
404+
405+
对于 App Runner
406+
```
407+
https://aws-gcr-solutions.s3.amazonaws.com/swift-chat/latest/SwiftChatAppRunner.template
408+
```
409+
对于 Lambda
410+
```
411+
https://aws-gcr-solutions.s3.amazonaws.com/swift-chat/latest/SwiftChatLambda.template
412+
```
413+
4. 点击 **下一步** 按钮并继续点击 **下一步** 按钮。在 **配置堆栈选项** 页面上,
414+
勾选 `我确认,AWS CloudFormation 可能会创建 IAM 资源。` 然后点击 **下一步****提交** 按钮来更新您的 CloudFormation 模板。
415+
360416
## 安全
361417

362418
更多信息请参见 [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications)。

react-native/Gemfile.lock

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,12 @@ GEM
55
base64
66
nkf
77
rexml
8-
activesupport (7.0.8.3)
8+
activesupport (6.1.7.10)
99
concurrent-ruby (~> 1.0, >= 1.0.2)
1010
i18n (>= 1.6, < 2)
1111
minitest (>= 5.1)
1212
tzinfo (~> 2.0)
13+
zeitwerk (~> 2.3)
1314
addressable (2.8.6)
1415
public_suffix (>= 2.0.2, < 6.0)
1516
algoliasearch (1.27.5)
@@ -88,13 +89,14 @@ GEM
8889
colored2 (~> 3.1)
8990
nanaimo (~> 0.3.0)
9091
rexml (>= 3.3.6, < 4.0)
92+
zeitwerk (2.6.18)
9193

9294
PLATFORMS
9395
ruby
9496

9597
DEPENDENCIES
9698
activesupport (>= 6.1.7.5, < 7.1.0)
97-
cocoapods (>= 1.13, < 1.15)
99+
cocoapods (>= 1.13, < 1.17)
98100

99101
RUBY VERSION
100102
ruby 3.2.2p53
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
//
2+
// VoiceChatModule.m
3+
// SwiftChat
4+
//
5+
// Created on 2025/4/10.
6+
//
7+
8+
#import <React/RCTBridgeModule.h>
9+
#import <React/RCTEventEmitter.h>
10+
11+
@interface RCT_EXTERN_MODULE(VoiceChatModule, RCTEventEmitter)
12+
13+
RCT_EXTERN_METHOD(initialize:(NSDictionary *)config
14+
withResolver:(RCTPromiseResolveBlock)resolve
15+
withRejecter:(RCTPromiseRejectBlock)reject)
16+
17+
RCT_EXTERN_METHOD(startConversation:(NSString *)systemPrompt
18+
withVoiceId:(NSString *)voiceId
19+
withAllowInterruption:(BOOL *)voiceId
20+
withResolver:(RCTPromiseResolveBlock)resolve
21+
withRejecter:(RCTPromiseRejectBlock)reject)
22+
23+
24+
RCT_EXTERN_METHOD(endConversation:(RCTPromiseResolveBlock)resolve
25+
withRejecter:(RCTPromiseRejectBlock)reject)
26+
27+
RCT_EXTERN_METHOD(updateCredentials:(NSDictionary *)config
28+
withResolver:(RCTPromiseResolveBlock)resolve
29+
withRejecter:(RCTPromiseRejectBlock)reject)
30+
31+
@end

0 commit comments

Comments
 (0)