English | ä¸ć–‡
Streams to River is an English learning application. The purpose of this product is to record, extract, and manage English words, sentences, and related contexts encountered in daily life, combined with the Ebbinghaus Forgetting Curve for periodic learning and memorization.
During development, TRAE was extensively used for code development, debugging, annotation, and unit test writing. Through Coze workflow, capabilities such as image-to-text, real-time chat, speech recognition, and word highlighting were quickly integrated.
Streams to River V2 is a word learning and language processing microservice system built on the Hertz and Kitex frameworks. The system provides a complete solution from API services to RPC implementation, including core functional modules such as user authentication, word management, review progress tracking, real-time chat, speech recognition, and image-to-text conversion, using MySQL and Redis for data storage and cache optimization.
The system is designed to provide users with a comprehensive language learning platform, enhancing learning effectiveness and user experience by combining traditional word learning methods with modern AI technology. The system supports features such as word addition, querying, tag management, review progress tracking, and intelligent chat, and integrates multimodal processing capabilities such as speech recognition and image-to-text conversion to provide users with richer and more convenient learning methods.
The system adopts a front-end and back-end separated microservice architecture, mainly divided into the following layers:
- API Service Layer: Based on the Hertz framework, providing HTTP API interfaces to handle requests from the front-end
- RPC Service Layer: Based on the Kitex framework, implementing business logic to handle requests from the API service layer
- Data Access Layer: Including MySQL database and Redis cache, responsible for persistent storage and caching of data
- Intelligent Processing Layer: Integrating large language models (LLM), speech recognition (ASR), and image-to-text functionality
sequenceDiagram
participant Client as Client
participant API as API Service Layer
participant RPC as RPC Service Layer
participant DAL as Data Access Layer
participant DB as Database/Cache
participant External as External Services
Client->>API: HTTP Request
API->>RPC: RPC Call
RPC->>DAL: Data Operation
DAL->>DB: CRUD Operation
RPC->>External: Call External Service (e.g., LLM)
External-->>RPC: Return Result
RPC-->>API: Return RPC Response
API-->>Client: Return HTTP Response
Category | Technology/Framework | Description |
---|---|---|
HTTP Framework | Hertz | High-performance Golang HTTP framework for building API services |
RPC Framework | Kitex | High-performance, highly extensible Golang RPC framework for building microservices |
Data Storage | MySQL | Relational database for persistent storage of user data, word information, etc. |
Cache Service | Redis | In-memory database for caching hot data to improve system performance |
Communication Protocol | HTTP/RESTful | For communication between front-end and API service layer |
RPC | For communication between API service layer and RPC service layer | |
WebSocket | For real-time communication, such as speech recognition service | |
Server-Sent Events (SSE) | For streaming communication, such as real-time chat functionality | |
AI/ML Integration | Large Language Model (LLM) | For intelligent chat, content generation, and word highlighting |
Speech Recognition (ASR) | For converting speech to text | |
Image Processing | For image-to-text functionality | |
Monitoring and Observability | OpenTelemetry | For system monitoring, metrics collection, and performance analysis |
Security | JWT | For user authentication and authorization |
Deployment and Service Discovery | Service Registration and Discovery | For microservice registration and discovery |
Dynamic Configuration Management | For dynamic management of system configuration |
The user management module is responsible for user registration, login, and information management, with the following main functions:
- User registration: Supports registration with username, email, and password
- User login: Supports login with username and password, returns JWT token
- User information retrieval: Retrieves information of the currently logged-in user
The word learning system is the core functional module of the system, responsible for word management, review, and tag management, with the following main functions:
- Word management: Adding, querying, retrieving details, and listing words
- Tag management: Supporting classification and tagging of words
- Review system: Generating review lists, tracking review progress, and verifying answers
- Word details: Providing word definitions, phonetic symbols, example sentences, and translations
The intelligent chat module is based on large language models (LLM) and provides real-time chat functionality with the following main features:
- Streaming communication: Using Server-Sent Events (SSE) for streaming responses
- Session management: Supporting session ID and context management
- Content highlighting: Supporting highlighting of words in chat content
- Sensitive content review: Filtering chat content for sensitive words
The multimodal processing module integrates speech recognition and image-to-text functionality, providing users with multiple input methods:
- Speech recognition: Converting speech input to text
- Image-to-text: Converting content in images to text descriptions
The documentation service module provides API documentation and usage guides for the system, with the following main functions:
- API documentation generation: Automatically generating API documentation for the system
- Markdown processing: Supporting processing and conversion of Markdown format documents
- HTML generation: Converting Markdown documents to HTML format
The system monitoring and management module is responsible for system monitoring, configuration management, and log processing, with the following main functions:
- Performance monitoring: Using OpenTelemetry for performance monitoring and metrics collection
- Configuration management: Supporting dynamic configuration management and environment variable reading
- Log management: Providing unified log recording and management functionality
- Service registration and discovery: Supporting microservice registration and discovery
For more information, please refer to repome
Update config file: stream2river
LLM:
ChatModel:
# You need to go to the Volcano Ark platform https://console.volcengine.com/ark/region:ark+cn-beijing/model/detail?Id=doubao-1-5-pro-32k to apply for the latest Doubao Pro text model and get their latest api_key and model_id
APIKey: ""
Model: ""
Coze:
BaseURL: "https://api.coze.cn"
# The following fields are configured with reference to rpcservice/biz/chat/coze/README.md
WorkflowID: ""
Auth: ""
Token: ""
ClientID:
PublishKey:
PrivateKey:
Update config file: stream2river
LLM:
AsrModel:
# You can read the "Sentence Recognition" access document in advance: https://www.volcengine.com/docs/6561/80816, and go to the Volcano Ark platform to access the sentence recognition capability https://console.volcengine.com/speech/service/15, and fill in the following AppID / Token / Cluster provided by the platform
AppID: ""
Token: ""
Cluster: ""
VisionModel:
# You need to go to the Volcano Ark platform https://console.volcengine.com/ark/region:ark+cn-beijing/model/detail?Id=doubao-1-5-vision-lite to apply for Doubao's latest Vision lite model and get their latest api_key and model_id
APIKey: ""
Model: ""
# JWT_SECRET is used to sign and verify JWT tokens. It must be a long, random string.
# Recommended to use at least 32 bytes (256 bits) of random data.
# You can generate a secure random string using the following commands:
# openssl rand -base64 32
# or in Python: import secrets; print(secrets.token_urlsafe(32))
JWT_SECRET: your_secret_key
Before running the backend services, make sure you have installed docker and docker-compose. For more information, see https://docs.docker.com/engine/install/ and https://docs.docker.com/compose/install/ .
After starting the docker service, run ./dockerfile/run.sh
in the project root directory to start the backend services.
Refer to the client/README.md document.
Refer to the Coze Config document.
Implement a function for retrieving the words to be recited. The basic logic is as follows:
- From the "words_recite_record" table, select all records of the current user (whose user_id is passed as a parameter) whose "next_review_time" is earlier than the current time.
- For each record, obtain detailed information from the "words" table using their "word_id".
- For each record, generate three types of review questions. Each question contains the question stem and four answer options.
- The first type: Select the correct Chinese meaning. The logic is as follows: the question stem is the "word_name" in the "words" table. The options consist of two parts: one part is the "explanations" in the "words" table. Additionally, randomly select 3 answers from the "answer_list" data table. The selection method is to first find the record in the "answer_list" table where "user" equals the current user, and then randomly select 3 order_ids from 1 to the maximum order_id. The "description" field of these 3 records will be used as the options. Note that an exclusion logic must also be implemented.
- The second type: Select the correct English meaning. This can be defined as the constant "CHOOSE_EN". The logic is similar to the above. The difference is that the question stem is the "explanations" in the "words" table. The options are the "word_name" in the "answer_list".
- The third type: Select the correct Chinese meaning based on the pronunciation. This can be defined as the constant "PRONOUNCE_CHOOSE". The logic is also similar to the first type. The difference is that the question stem is the "pronounce_us" in the "words" table. The options are the "description" in the "answer_list".
Generated Code: review_list.go
Copyright (c) 2025 Bytedance Ltd. and/or its affiliates. All rights reserved.
Licensed under the MIT license.