[WIP] Various Improvements #12

kahrendt · 2024-07-20T12:33:25Z

Media playback is improved:

A basic resampler adjusts sample rates (quality isn't great and this is slow)
Handles both mono and stereo media
Supports FLAC files
Adds volume/mute control via the DAC (the wheel works for increasing/decreasing volume)
Music Assistant streams work (both mp3 and flac), but since it requires resampling, the audio quality isn't great

Fixes a few issues:

Reduces long delay in the voice assistant component marking the end after a TTS announcement is finished
Eliminated the media player component from warning about delaying too long
Eliminated the speaker hiss with no audio playback

This isn't very stable at the moment, hence the WIP PR.

Internal memory use is high.. could potentially hit issues if decoding two streams
Changing media streams causes issues (memory not properly freed, can't start a second stream without two attempts)
Buffers are not optimized, especially on the announcement side (they are way too big for announcements, causing a delay in playing the response)

github-actions · 2024-07-20T12:39:57Z

Firmware built successfully! 🎉

Download and extract the firmware to install with https://web.esphome.io

Make sure to choose esphome-voice-kit-esp32s3/esphome-voice-kit-esp32s3.factory.bin.

github-actions · 2024-07-22T18:17:22Z

Firmware built successfully! 🎉

Download and extract the firmware to install with https://web.esphome.io

Make sure to choose esphome-voice-kit-esp32s3/esphome-voice-kit-esp32s3.factory.bin.

kahrendt · 2024-07-22T18:19:46Z

These new set of commits should improve stability, though internal memory usage is still too high for decoding and resampling two simultaneous streams. Main highlights:

Basic wav header decoding (thanks @synesthesiam )
Flac headers are decoded consistently
Able to stop announcements on the wake word (I will improve how this functions when I work on the input/mWW/voice assistant side of things)
Starting and stopping streams is a lot more stable
Initial support for playing back local files (thanks @jesserockz for your file external component). I quickly added a button to show how to call it, but it should be easy to adapt it into a timer beeper
Buffer sizes were increased/aligned, so it is much less likely to stutter after music playback has started (though it still happens at the beginning of a file on occasion)
I apply two Biquad filters when resampling. This greatly improves the sound quality when playing back typical music files (at 44.1 kHz or 48 kHz). It does seem to reduce the sound quality when downsampling files with low sample rates (say 22.05 kHz).
Modified mWW and voice assistant components to not use high frequency loopers, so another CPU core is available more often. This eliminates most stutters when decoding and mixing two audio files while wake word detection is going on (resampling is still the bottleneck here though)
Switched to using the left microphone stream. This seems a lot cleaner with less false activations/better end of speech detection. I am having trouble finding documentation from XMOS regarding why the left and right channels seem different.

kahrendt added 18 commits July 20, 2024 06:52

first pass of framework for resampler

911c1cb

add resample header

9ee17ab

convert mono to stereo in ResamplerStreamer

5fea223

reduce timeout for announcing state

043dc1b

first pass at sending stream info/stereo output

54b428f

working stereo mp3s

fc6e871

working resampler!

576740d

use mp3s for TTS

2b13e8f

split most streamers into own files

5e3cacd

revert back to using pcm for tts

5a096ae

functioning flac decoder

01c2a3c

implement dac volume control

a4774b3

publish state after muting/unmuting

32da443

flac decoder pulls directly from ring buffer

8b9e8ea

update todos

4fbfbc9

add support for volume up and down commands

c341e6b

tweak some memory allocations

c33e4ec

add volume support

b787a62

kahrendt added 11 commits July 20, 2024 10:30

define i2c registers as static const

948a01a

mixing algorithm - consistent announcement loudness

e373139

clear specific ring bufers in mixer when stopping

df8cea2

stop announcement on wake word

6be021c

add some TODOs

97726c3

avoid high freq loopers to prevent stuttering

ffd3e30

apply biquad filters when resampling

65e6603

clean up code/variable names

6199d21

integrate synesthesiam's wav header parser

956b676

stop active pipeline before starting new stream

955ac91

uniform I2C function return behavior

efadd8d

kahrendt added 6 commits July 21, 2024 12:38

simplify announcement flag handling

c88329b

increase and align buffer sizes

a0eb385

fix edge case of trying to stop a stopping pipeline

366ea80

initial work for playing local media files

77834f3

update for release

aa06753

fix typo

ed89492

kahrendt merged commit 4c796e2 into dev Jul 22, 2024
3 checks passed

kahrendt deleted the kahrendt-2024-2 branch July 22, 2024 19:34

kahrendt restored the kahrendt-2024-2 branch July 22, 2024 19:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Various Improvements #12

[WIP] Various Improvements #12

kahrendt commented Jul 20, 2024 •

edited

Loading

github-actions bot commented Jul 20, 2024

github-actions bot commented Jul 22, 2024

kahrendt commented Jul 22, 2024

[WIP] Various Improvements #12

[WIP] Various Improvements #12

Conversation

kahrendt commented Jul 20, 2024 • edited Loading

github-actions bot commented Jul 20, 2024

github-actions bot commented Jul 22, 2024

kahrendt commented Jul 22, 2024

kahrendt commented Jul 20, 2024 •

edited

Loading