Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Various Improvements #12

Merged
merged 35 commits into from
Jul 22, 2024
Merged

[WIP] Various Improvements #12

merged 35 commits into from
Jul 22, 2024

Conversation

kahrendt
Copy link
Collaborator

@kahrendt kahrendt commented Jul 20, 2024

Media playback is improved:

  • A basic resampler adjusts sample rates (quality isn't great and this is slow)
  • Handles both mono and stereo media
  • Supports FLAC files
  • Adds volume/mute control via the DAC (the wheel works for increasing/decreasing volume)
  • Music Assistant streams work (both mp3 and flac), but since it requires resampling, the audio quality isn't great

Fixes a few issues:

  • Reduces long delay in the voice assistant component marking the end after a TTS announcement is finished
  • Eliminated the media player component from warning about delaying too long
  • Eliminated the speaker hiss with no audio playback

This isn't very stable at the moment, hence the WIP PR.

  • Internal memory use is high.. could potentially hit issues if decoding two streams
  • Changing media streams causes issues (memory not properly freed, can't start a second stream without two attempts)
  • Buffers are not optimized, especially on the announcement side (they are way too big for announcements, causing a delay in playing the response)

Copy link

Firmware built successfully! 🎉

Download and extract the firmware to install with https://web.esphome.io

Make sure to choose esphome-voice-kit-esp32s3/esphome-voice-kit-esp32s3.factory.bin.

Copy link

Firmware built successfully! 🎉

Download and extract the firmware to install with https://web.esphome.io

Make sure to choose esphome-voice-kit-esp32s3/esphome-voice-kit-esp32s3.factory.bin.

@kahrendt
Copy link
Collaborator Author

These new set of commits should improve stability, though internal memory usage is still too high for decoding and resampling two simultaneous streams. Main highlights:

  • Basic wav header decoding (thanks @synesthesiam )
  • Flac headers are decoded consistently
  • Able to stop announcements on the wake word (I will improve how this functions when I work on the input/mWW/voice assistant side of things)
  • Starting and stopping streams is a lot more stable
  • Initial support for playing back local files (thanks @jesserockz for your file external component). I quickly added a button to show how to call it, but it should be easy to adapt it into a timer beeper
  • Buffer sizes were increased/aligned, so it is much less likely to stutter after music playback has started (though it still happens at the beginning of a file on occasion)
  • I apply two Biquad filters when resampling. This greatly improves the sound quality when playing back typical music files (at 44.1 kHz or 48 kHz). It does seem to reduce the sound quality when downsampling files with low sample rates (say 22.05 kHz).
  • Modified mWW and voice assistant components to not use high frequency loopers, so another CPU core is available more often. This eliminates most stutters when decoding and mixing two audio files while wake word detection is going on (resampling is still the bottleneck here though)
  • Switched to using the left microphone stream. This seems a lot cleaner with less false activations/better end of speech detection. I am having trouble finding documentation from XMOS regarding why the left and right channels seem different.

@kahrendt kahrendt merged commit 4c796e2 into dev Jul 22, 2024
3 checks passed
@kahrendt kahrendt deleted the kahrendt-2024-2 branch July 22, 2024 19:34
@kahrendt kahrendt restored the kahrendt-2024-2 branch July 22, 2024 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant