Skip to content

[BUG] esp32-client can not decode incoming audio frames #1477

@monsterboom

Description

@monsterboom

Description

I have successfully run ten-agent on my PC. And successfully in the http://localhost: 3000 page, the normal connection dialogue. I used deepgram and bytedance.
Then esp32-client I can also successfully compile through, and burn to the development board.

This is my esp32-client app config.

// main/app_config.h 
#pragma once


//LLM Agent Service
#define TENAI_AGENT_URL       "http://192.168.230.101:8080"

// LLM Agent Graph, you can select openai or gemini 
// #define CONFIG_GRAPH_OPENAI   /* openai, just only audio */
#define CONFIG_GRAPH_GEMINI     /* gemini, for video and audio, but not support chinese language */

/* greeting */
#define GREETING               "Can I help You?"
#define PROMPT                 ""

/* different settings for different agent graph */
#if defined(CONFIG_GRAPH_OPENAI)
#define GRAPH_NAME             "va_openai_v2v"

#define V2V_MODEL              "gpt-realtime"
#define LANGUAGE               "en-US"
#define VOICE                  "ash"
#elif defined(CONFIG_GRAPH_GEMINI)
#define GRAPH_NAME             "va_gemini_v2v"
#else
#error "not config graph for aiAgent"
#endif

// LLM Agent Task Name
#define AI_AGENT_NAME          "tenai0125-11"
// LLM Agent Channel Name
#define AI_AGENT_CHANNEL_NAME  "agora_0dr2o4"
// LLM User Id
#define AI_AGENT_USER_ID        12345 // user id, for device



/* function config */
/* audio codec */
#define CONFIG_USE_G711U_CODEC
/* video process */
// #define CONFIG_AUDIO_ONLY

And this is my device logs.

wifi event 4
wifi sta mode connect.
I (1347) wifi:<ba-add>idx:0 (ifx:0, 48:bd:3d:93:02:10), tid:0, ssn:0, winSize:64
I (1408) wifi:AP's beacon interval = 102400 us, DTIM period = 1
I (2835) esp_netif_handlers: sta ip: 192.168.230.116, mask: 255.255.255.0, gw: 192.168.230.254
wifi event 0
got ip: 
192.168.230.116HTTP_EVENT_ON_CONNECTED
HTTP_EVENT_HEADER_SENT
http_with_url request ={
        "request_id":   "tenai0125-11",
        "uid":  12345,
        "channel_name": "agora_0dr2o4"
}
HTTP_EVENT_ON_HEADER, key=Access-Control-Allow-Credentials, value=true
HTTP_EVENT_ON_HEADER, key=Access-Control-Allow-Headers, value=*
HTTP_EVENT_ON_HEADER, key=Access-Control-Allow-Methods, value=*
HTTP_EVENT_ON_HEADER, key=Access-Control-Allow-Origin, value=*
HTTP_EVENT_ON_HEADER, key=Access-Control-Expose-Headers, value=*
HTTP_EVENT_ON_HEADER, key=Content-Type, value=application/json; charset=utf-8
HTTP_EVENT_ON_HEADER, key=Date, value=Wed, 10 Sep 2025 11:11:18 GMT
HTTP_EVENT_ON_HEADER, key=Content-Length, value=320
HTTP_EVENT_ON_DATA, len=216, data={"code":"0","data":{"appId":"d5ca7c***0257","channel_name":"agora_0dr2o4","token":"007eJxSYKjXrXV***QzMrJIMzEyT7EwNbAwT7EwMDNOszQwMjW/FnMwoyGQkaFdj4e
HTTP_EVENT_ON_DATA, len=104, data=ZiYGRgYWBkQHEZ***m7AyGBoZm5iCtEA0QAUAAQAA//8T7yNU","uid":12345},"msg":"success"}-Allow-Methods: *
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: *
Content-Type: application/json; charset=utf-8
Date: Wed, 10 Sep 2025 11:11:18 GMT
Content-Length: 320

{"code":"0","data":{"appId":"d5ca7c***0257","channel_name":"agora_0dr2o4","token":"007eJxSYK***AwT7EwMDNOszQwMjW/FnMwoyGQkaFdj4e
HTTP_EVENT_ON_FINISH
code: 0
msg: success
appId: d5ca7***257
token: 007eJxSY***QAUAAQAA//8T7yNU
HTTPS Status = 200, content_length = 320
HTTP_EVENT_DISCONNECTED
I (3040) AUDIO_THREAD: The esp_periph task allocate stack on external memory
I (3047) AUDIO_THREAD: The button_task task allocate stack on external memory
I (3055) AUDIO_THREAD: The input_key_service task allocate stack on internal memory
I (3068) DRV8311: ES8311 in Slave mode
I (3080) gpio: GPIO[48]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0 
W (3085) I2C_BUS: I2C bus has been already created, [port:0]
I (3089) ES7210: ES7210 in Slave mode
I (3099) ES7210: Enable ES7210_INPUT_MIC1
I (3103) ES7210: Enable ES7210_INPUT_MIC2
I (3106) ES7210: Enable ES7210_INPUT_MIC3
W (3110) ES7210: Enable TDM mode. ES7210_SDP_INTERFACE2_REG12: 2
I (3116) ES7210: config fmt 60
I (3118) AUDIO_HAL: Codec mode is 3, Ctrl:1
I (3126) AUDIO_THREAD: The audio_send_task task allocate stack on external memory
I (3129) AUDIO_PIPELINE: link el->rb, el:0x3c1c3878, tag:i2s, rb:0x3c1c46bc
I (3135) AUDIO_PIPELINE: link el->rb, el:0x3c1c3c18, tag:algo, rb:0x3c1c6704
audio recorder has been created
I (3147) AUDIO_PIPELINE: link el->rb, el:0x3c1c6b6c, tag:raw, rb:0x3c1c7188
~~~~~start agora rtc demo~~~~
[1757531446.275][crt] log_mod: id= 0 level=wrn name=
[1757531446.278][crt] log_mod: id= 1 level=wrn name=api
[1757531446.283][crt] log_mod: id= 2 level=wrn name=user
[1757531446.288][crt] log_mod: id= 3 level=wrn name=net
[1757531446.293][crt] log_mod: id= 4 level=wrn name=srv
[1757531446.298][crt] log_mod: id= 5 level=wrn name=rtn
[1757531446.303][crt] log_mod: id= 6 level=wrn name=cb
[1757531446.308][crt] log_mod: id= 7 level=wrn name=snd_aud
[1757531446.313][crt] log_mod: id= 8 level=wrn name=rcv_aud
[1757531446.318][crt] log_mod: id= 9 level=wrn name=snd_vid
[1757531446.324][crt] log_mod: id=10 level=wrn name=rcv_vid
[1757531446.329][crt] log_mod: id=11 level=wrn name=stat
[1757531446.334][crt] log_mod: id=12 level=wrn name=fec
[1757531446.339][crt] log_mod: id=13 level=wrn name=bwe
[1757531446.344][crt] log_mod: id=14 level=wrn name=argus
[1757531446.349][crt] log_mod: id=15 level=wrn name=aud_codec
[1757531446.354][crt] log_mod: id=16 level=wrn name=trans
[1757531446.359][crt] log_mod: id=17 level=wrn name=proxy
[1757531446.365][crt] log_mod: id=18 level=wrn name=rtm
[1757531446.370][crt] log_mod: id=19 level=wrn name=param
[1757531446.375][crt] log_mod: id=20 level=wrn name=crypt
[1757531446.380][crt] log_mod: id=21 level=wrn name=aut
[1757531446.385][crt] log_mod: id=22 level=wrn name=rdt
[1757531446.390][crt] log_mod: id=23 level=wrn name=p2p
[1757531446.395][crt] log_mod: id=24 level=wrn name=data_stream
[1757531446.400][crt] log service initialize
[1757531446.406][crt] PACKAGE_INFO: v1.9.5
[1757531446.408][crt] Version 1.9.5 built @ Jan  2 2025 14:59:09
[1757531446.414][crt] AHPL Git b: Unknown_Branch c: Unknown_Commit, SDK Git b: Unknown_Branch c: Unknown_Commit
[1757531446.424][crt] Hardware model: LiteOS 1.0
[1757531446.430][wrn] Can not get argus addr
[1757531446.433][crt][api] init service appid=d5ca***Bnlm7AyGBoZm5iCtEA0QAUAAQAA//8T7yNU event_handler=0x3fcb773c option=0x3fcb76c0
[1757531446.459][crt][api] init service done, rval=0
~~~~~agora_rtc_init success~~~~
[1757531446.472][crt][bwe] set cc version 1
[1757531446.561][crt][ch:1] [vocs] CCGA res from 106.*****.130: channel="agora_0dr2o4" code=0
[1757531446.563][crt][ch:1] add vos server 223.*****.89:4002
[1757531446.564][crt][ch:1] add vos server 120.*******.20:4005
[1757531446.570][crt][ch:1] add vos server 36.*******.78:4007
[1757531446.575][crt][ch:1] Choose vos server 223.*****.89:4002
[1757531446.625][crt][rtn][ch:1] enc=1, login vos succeed, ip 223.*****.89
[1757531446.626][crt][ch:1] join_spend_time=127
I (3512) AUDIO_THREAD: The i2s task allocate stack on external memory
[conn-1] Join the channel agora_0dr2o4 successfully, uid 12345 elapsed 156 ms
I (3515) AUDIO_ELEMENT: [i2s-0x3c1c3878] Element task created
I (3528) AUDIO_THREAD: The algo task allocate stack on external memory
I (3536) AUDIO_ELEMENT: [algo-0x3c1c3c18] Element task created
I (3542) AUDIO_ELEMENT: [raw-0x3c1c454c] Element task created
[1757531446.664][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
[1757531446.673][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
[conn-166425] Remote user "1" has joined the channel, elapsed 35 ms
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
[1757531446.682][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
I (3548) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8206096 Bytes, Inter:134344 Bytes, Dram:134344 Bytes, Dram largest free:69632Bytes

[1757531446.700][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
I (3597) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1
[1757531446.724][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
[1757531446.736][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
[1757531446.744][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
[1757531446.751][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
I (3611) AUDIO_ELEMENT: [algo] AEL_MSG_CMD_RESUME,state:1
[1757531446.773][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
[1757531446.785][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
I (3660) AFE_VC: afe interface for voice communication

[1757531446.800][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
[1757531446.812][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
[1757531446.820][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
I (3660) AUDIO_PIPELINE: Pipeline started
[1757531446.840][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
I (3687) AFE_VC: AFE version: VC_V220727

[1757531446.853][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
[1757531446.864][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
[1757531446.874][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
[1757531446.885][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
[1757531446.895][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
I (3727) AUDIO_ELEMENT: [raw-0x3c1c6b6c] Element task created
[1757531446.913][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
I (3739) AFE_VC: Initial auido front-end, total channel: 2, mic num: 1, ref num: 1

[1757531446.929][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
[1757531446.943][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
[1757531446.952][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
I (3802) AUDIO_THREAD: The i2s task allocate stack on external memory
[1757531446.971][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
I (3818) AFE_VC: aec_init: 1, se_init: 0, vad_init: 0

[1757531446.987][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
[1757531446.999][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
I (3860) AUDIO_ELEMENT: [i2s-0x3c1c6e28] Element task created
[1757531447.014][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
[1757531447.026][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
[1757531447.036][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
I (3874) AFE_VC: wakenet_init: 0, voice_communication_agc_init: 0

[1757531447.054][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
I (3902) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8190780 Bytes, Inter:133396 Bytes, Dram:133396 Bytes, Dram largest free:69632Bytes

[1757531447.070][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
I (3966) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
[1757531447.091][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
I (3943) AFE_VC: ns_mode: 0

[1757531447.112][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
[1757531447.122][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
[1757531447.130][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
[1757531447.137][err] [ch:1] cb on error 221, err_msg Unable to decode incoming audio frame
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
Error 221 is captured. Error msg "Unable to decode incoming audio frame"
[1757531447.146][err][rcv_aud][ch:1] incoming audio type 122 mismatch codec 4
I (3979) AUDIO_PIPELINE: Pipeline started

I feel like I'm this close to being able to run completely, and I hope it works out.

Environment

PC: Mac mini、DEVICE: ESP32S3 Korvo 2 V3

Steps to reproduce

Follow the documentation.

Expected behavior

no error

Severity

Critical

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions