Getting: OSError: [Errno -9996] Invalid input device (no default output device) #10

gilamonyet · 2025-01-27T07:49:07Z

gilamonyet
Jan 27, 2025

Hi,

First of all, I'll be straight and honest - I'm a complete newb/beginner regarding Docker + I'm not a developer by any means. I stumbled across this Github repository because I was looking for ways to have a voice chat locally with an AI to mimic NPC conversations while playing video games.

After some trials and errors, I managed to build the image via Docker.Desktop, and I'm able to see the UI via localhost. However, when I try to talk to it, the logs show me the following error, and I have no clue how to fix it.

I tried asking ChatGPT (I know, don't blame me please) and it mentioned something about Docker not being able to find (or access) my local output device. I've checked my settings in Windows and my headphones are being marked as the output device and my microphone is being marked as the default input device.

I hope someone can help!

See the full log;

2025-01-27 08:38:12 Traceback (most recent call last):
2025-01-27 08:38:12   File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
2025-01-27 08:38:12     self.run()
2025-01-27 08:38:12   File "/usr/lib/python3.10/threading.py", line 953, in run
2025-01-27 08:38:12     self._target(*self._args, **self._kwargs)
2025-01-27 08:38:12   File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
2025-01-27 08:38:12     return loop.run_until_complete(main)
2025-01-27 08:38:12   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
2025-01-27 08:38:12   File "/app/app/app_logic.py", line 90, in conversation_loop
2025-01-27 08:38:12     user_input = await record_audio_and_transcribe() 
2025-01-27 08:38:12   File "/app/app/app_logic.py", line 31, in record_audio_and_transcribe
2025-01-27 08:38:12     await record_audio(audio_file)
2025-01-27 08:38:12   File "/app/app/app.py", line 557, in record_audio
2025-01-27 08:38:12     stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=chunk_size)
2025-01-27 08:38:12   File "/usr/local/lib/python3.10/dist-packages/pyaudio/__init__.py", line 639, in open
2025-01-27 08:38:12     stream = PyAudio.Stream(self, *args, **kwargs)
2025-01-27 08:38:12   File "/usr/local/lib/python3.10/dist-packages/pyaudio/__init__.py", line 441, in __init__
2025-01-27 08:38:12     self._stream = pa.open(**arguments)
2025-01-27 08:38:12 OSError: [Errno -9996] Invalid input device (no default output device)```

bigsk1 · 2025-01-27T11:09:55Z

bigsk1
Jan 27, 2025
Maintainer

Ya the docker setup is bulky and running the app nativly is best otherwise you might need to do some troubleshooting on some systems.

So based on the logs it doesn't seem to be able to pick up your microphone, in windows check mircophone privacy settings to make sure it is toggled on, in privacy and security / microphone. Also make sure apps can access mic, and the app is either going to be python or wsl so make sure those are toggled on. Also make sure your browser microphone settings are good.

0 replies

gilamonyet · 2025-01-28T09:19:39Z

gilamonyet
Jan 28, 2025
Author

Thanks for the quick reply! I'll try to check the privacy settings and maybe give it a go to run it natively but like I said I'm a complete newb with no knowledge regarding these things. I appreciate you taking the time to respond!

0 replies

gilamonyet · 2025-01-28T19:34:28Z

gilamonyet
Jan 28, 2025
Author

I managed to manually to install it and I got it to work (with a lot of trail and error). One question, I'm running the Ollama model locally and I noticed the response time takes quite a while (i'd say ~~15 to 20~~sec).

Also the chatgpt response goes over the 250character limit. Is there a way to give some parameters so it will keep it answers to the max limit.

0 replies

bigsk1 · 2025-01-29T09:09:56Z

bigsk1
Jan 29, 2025
Maintainer

yes ollama on first response has to load and once model in memory should run faster and then if you are running it on cpu only takes longer, if you are running on fast gpu runs faster, I have rtx 4090 and using llama3 takes around 4 secs once it is loaded. Openai 4o-mini and good internet connection seems to work quick.

If you mean the chatgpt response as in ollama response and not openai, different models follow the prompt better like I noticed llama3.2 is more chatty and doesn't follow a limit as llama3.1 does. So you have downloaded the ollama llama3.1 7b and try that and see if it works better than the 3.2. I have noticed mistal is really bad and doesnt follow as it should.

You can also add a token limit to the ollama payload request in app.py ( starts at line 416) just change num_predict from -2 to 250, adjust as needed.

payload = {
    "model": OLLAMA_MODEL,
    "messages": [{"role": "system", "content": system_message + "\n" + mood_prompt}] 
                + conversation_history 
                + [{"role": "user", "content": user_input}],
    "stream": True,
    "options": {
        "num_predict": 250,  # Set the max token limit (adjust as needed)
        "temperature": 1.0
    }
}

The hard coded prompt gives the the model a limit to not go over, sometimes it doesn't listen to it, the audio also has a limit of characters, if I remember right xtts cuts off after a certain limit as it doesn't do good with longer responses, if elevenlabs or openai it is a bit longer. It all depends on the setup ( providers and models being used) and what type of hardware you are running on.

You should be able to make some speed changes like in app.py changing this ( I haven't tried this on this project but tiny.en works on other projects I have used)

# Set up the faster-whisper model
model_size = "medium.en"

to this

# Set up the faster-whisper model
model_size = "tiny.en"

0 replies

bigsk1 · 2025-01-29T10:04:31Z

bigsk1
Jan 29, 2025
Maintainer

I actually just tried modifying the num_predict and also setting a max_token perimeter and it doesn't work, I think the only way to limit the model in ollama is by adjusting the models configuration file on the host, The default context size for Ollama is 2048 tokens so would need to adjust for each model as needed. You can also inform each character in the characters/(character)/(character).txt and change from

KEEP RESPONSES TO A MAXIMUM OF 500 CHARACTERS.

to a lower number and see if the model follows it's prompt instructions better. So instead of 500 change to 250, ect.

0 replies

bigsk1 · 2025-02-20T11:41:12Z

bigsk1
Feb 20, 2025
Maintainer

@gilamonyet just did a big update, no longer need to manually download xtts files now, added docker support for cpu. Easier setup with docker compose. If volume mapping is correct in docker on start should use pyaudio without issues now.

0 replies

glefeber · 2025-02-20T11:55:11Z

glefeber
Feb 20, 2025

Thanks for the update! I might give it a go in the next couple of days.

0 replies

Uh oh!

Getting: OSError: [Errno -9996] Invalid input device (no default output device) #10

Uh oh!

gilamonyet Jan 27, 2025

Replies: 7 comments

Uh oh!

Uh oh!

bigsk1 Jan 27, 2025 Maintainer

Uh oh!

gilamonyet Jan 28, 2025 Author

Uh oh!

gilamonyet Jan 28, 2025 Author

Uh oh!

Uh oh!

bigsk1 Jan 29, 2025 Maintainer

Uh oh!

bigsk1 Jan 29, 2025 Maintainer

Uh oh!

bigsk1 Feb 20, 2025 Maintainer

Uh oh!

glefeber Feb 20, 2025

gilamonyet
Jan 27, 2025

bigsk1
Jan 27, 2025
Maintainer

gilamonyet
Jan 28, 2025
Author

gilamonyet
Jan 28, 2025
Author

bigsk1
Jan 29, 2025
Maintainer

bigsk1
Jan 29, 2025
Maintainer

bigsk1
Feb 20, 2025
Maintainer

glefeber
Feb 20, 2025