Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(audio): implement MKL-accelerated speech-to-text for Mac #328

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -144,4 +144,4 @@ jobs:
# alert-threshold: "200%"
# comment-on-alert: true
# fail-on-alert: true
# alert-comment-cc-users: "@louis030195"
# alert-comment-cc-users: "@louis030195"
8 changes: 6 additions & 2 deletions .github/workflows/release-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,11 @@ jobs:
args: "--target x86_64-apple-darwin --features metal"
target: x86_64-apple-darwin
- platform: "ubuntu-22.04" # Ubuntu x86_64
args: "" # TODO CUDA, mkl
args: "--features mkl"
target: x86_64-unknown-linux-gnu
- platform: "windows-latest" # Windows x86_64
args: "--target x86_64-pc-windows-msvc" # TODO CUDA, mkl? --features "openblas"
args: "--target x86_64-pc-windows-msvc --features mkl"
target: x86_64-pc-windows-msvc
pre-build-args: "" # --openblas
# windows arm: https://github.com/ahqsoftwares/tauri-ahq-store/blob/2fbc2103c222662b3c6ee0cd71fcde664824f0ef/.github/workflows/publish.yml#L136

Expand Down Expand Up @@ -150,6 +152,8 @@ jobs:
export PKG_CONFIG_PATH="/usr/local/opt/ffmpeg/lib/pkgconfig:$PKG_CONFIG_PATH"
export PKG_CONFIG_ALLOW_CROSS=1
export RUSTFLAGS="-C link-arg=-Wl,-rpath,@executable_path/../Frameworks -C link-arg=-Wl,-rpath,@loader_path/../Frameworks -C link-arg=-Wl,-install_name,@rpath/libscreenpipe.dylib"
elif [[ "${{ matrix.platform }}" == "ubuntu-22.04" || "${{ matrix.platform }}" == "windows-latest" ]]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do?

export RUSTFLAGS="-C target-cpu=native"
fi
cargo build --release ${{ matrix.args }}
ls -R target
Expand Down
4 changes: 3 additions & 1 deletion screenpipe-audio/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -80,10 +80,12 @@ criterion = { workspace = true }
memory-stats = "1.0"

[features]
default = ["metal"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why metal? and remove cuda? ???

windows / linux user dont want to use metal

metal = ["candle/metal", "candle-nn/metal", "candle-transformers/metal"]
cuda = ["candle/cuda", "candle-nn/cuda", "candle-transformers/cuda"]
mkl = ["candle/mkl", "candle-nn/mkl", "candle-transformers/mkl"]



[[bin]]
name = "screenpipe-audio"
path = "src/bin/screenpipe-audio.rs"
Expand Down
2 changes: 1 addition & 1 deletion screenpipe-audio/benches/stt_benchmark.rs
Original file line number Diff line number Diff line change
Expand Up @@ -69,4 +69,4 @@ fn criterion_benchmark(c: &mut Criterion) {
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
criterion_main!(benches);
15 changes: 13 additions & 2 deletions screenpipe-audio/src/stt.rs
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ pub struct WhisperModel {
impl WhisperModel {
pub fn new(engine: Arc<AudioTranscriptionEngine>) -> Result<Self> {
debug!("Initializing WhisperModel");
let device = Device::new_metal(0).unwrap_or(Device::new_cuda(0).unwrap_or(Device::Cpu));
info!("device = {:?}", device);
let device = Self::get_optimal_device()?;
info!("Using device: {:?}", device);

debug!("Fetching model files");
let (config_filename, tokenizer_filename, weights_filename) = {
Expand Down Expand Up @@ -86,6 +86,17 @@ impl WhisperModel {
device,
})
}

fn get_optimal_device() -> Result<Device> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why removed cuda?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it was an oversight. Thank you.

if let Ok(device) = Device::new_metal(0) {
info!("Using Metal GPU");
Ok(device)
} else {
info!("Metal not available, falling back to CPU");
Ok(Device::Cpu)
}
}

}

#[derive(Debug, Clone)]
Expand Down