DeepSpeech

is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper <https://arxiv.org/abs/1412.5567>. Project DeepSpeech uses Google's TensorFlow <https://www.tensorflow.org/> to make the implementation easier.

This is source code of DeepSpeech(my customized DeepSpeech for project KanTV), derived from original Mozilla's DeepSpeech. with a little enhancements(because I'm NOT technical expert in AI filed):

refine the entire log subsystem
refine the build system
refine the Android examples and validated them in my various phone(Huawei's phone with hisilicon SoC, Xiaomi 14 with latest qcom SoC------Qualcomm SM8650-AB Snapdragon 8 Gen 3 (4 nm))

The goal of this project is:

plain C/C++/Java implementation without dependencies
Android turn-key project for AI experts/ASR researchers and software developers

How to build project for target Android

prerequisites

Host OS information:

uname -a

Linux 5.8.0-43-generic #49~20.04.1-Ubuntu SMP Fri Feb 5 09:57:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/issue

Ubuntu 20.04.2 LTS \n \l

tools & utilities

sudo apt-get update
sudo apt-get install vim -y
sudo apt-get install net-tools -y
sudo apt-get install build-essential -y
sudo apt-get install cmake -y
sudo apt-get install curl -y
sudo apt-get install python -y
sudo apt-get install tcl expect -y
sudo apt-get install nginx -y
sudo apt-get install git -y
sudo apt-get install spawn-fcgi -y
sudo apt-get install u-boot-tools -y

sudo apt-get install libbrotli-dev -y
sudo apt-get install nasm -y
sudo apt-get install yasm -y
sudo apt-get install libass-dev -y
sudo apt-get install libx11-dev -y
sduo apt-get install libxcb-xinerama0-dev -y
sudo apt-get install libxfixes-dev -y
sudo apt-get install libxcb-xfixes0-dev -y
sudo apt-get install libxinerama-dev -y
sudo apt-get install libxcb-xinput-dev -y
sudo apt-get install libxi-dev -y
sudo apt-get install libasound2-dev -y
sudo apt-get install libxv-dev -y
sudo apt-get install libsdl2-dev -y
sudo apt install openjdk-17-jdk-headless -y

sudo apt-get install -y android-tools-adb android-tools-fastboot autoconf \
        automake bc bison build-essential ccache cscope curl device-tree-compiler \
        expect flex ftp-upload gdisk acpica-tools libattr1-dev libcap-dev \
        libfdt-dev libftdi-dev libglib2.0-dev libhidapi-dev libncurses5-dev \
        libssl-dev libtool make \
        mtools netcat python-crypto python3-crypto python-pyelftools \
        python3-pycryptodome python3-pyelftools python3-serial \
        rsync unzip uuid-dev xdg-utils xterm xz-utils zlib1g-dev

bazel

download and install bazel manually

  wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-linux-x86_64

  sudo ./bazel-3.1.0-installer-linux-x86_64.sh

Android NDK & Android Studio

download and install Android Studio and Android NDK-r21e manually

Android NDK-r21e(LTS)

Android Studio 4.2.1

Fetch source codes

git clone https://github.com/cdeos/DeepSpeech
cd DeepSpeech
git checkout kantv

Pre-Build

modify build script(build/envsetup.sh) to adapt to your local dev envs

https://github.com/cdeos/DeepSpeech/blob/kantv/build/envsetup.sh#L34

Build

step1:build all native codes to generated essential libs

. build/envsetup.sh
./build-all.sh

step2: build Android examples

build Android examples by latest Android Studio IDE

Post-Build

download model file from and upload model file to real Android phone

wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer

wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.tflite

adb push deepspeech-0.9.3-models.tflite  /sdcard/
deepspeech-0.9.3-models.tflite: 1 file pushed. 17.4 MB/s (47331784 bytes in 2.598s)

adb push deepspeech-0.9.3-models.scorer  /sdcard/
deepspeech-0.9.3-models.scorer: 1 file pushed. 16.4 MB/s (953363776 bytes in 55.349s)

Run Android example on real Android phone

Support

Please do not send e-mail to me(I'm just an experienced Andriod system software developer and interested in device-side AI application but NOT AI technical expert). Public technical discussion on github is preferred.
feel free to submit issues or new features(focus on Android at the moment), volunteer support would be provided if time permits.

Contribution

If you want to contribute to project DeepSpeech, be sure to review the opening issues.

We use GitHub issues for tracking requests and bugs, please see how to submit issue in this project .

whisper.cpp

is a powerful and excellent/amazing/shocking open source ASR project. compare to Mozilla's DeepSpeech on real Android phone:

the ASR result of whisper.cpp is very very very acurate and much better than Mozilla's DeepSpeech
the real-time performance of Mozilla's DeepSpeech is good and much better than whisper.cpp

Acknowledgement

I learnt/got too much from open source community and many/sincerely thanks to all contributors of the great open source community, especially all original authors and all contributors of the great Linux & Android & FFmpeg and other excellent projects. Hope this project is a little useful for open source community.

License

Copyright (c) 2021 maintainer of Mozilla's DeepSpeech

Licensed under MPL-2.0

Copyright (c) 2021-2023 maintainer of Project KanTV

Licensed under Apachev2.0 or later

Status/Defect

This project is no longer maintained.

Name		Name	Last commit message	Last commit date
Latest commit History 3,392 Commits
.circleci		.circleci
.github		.github
bin		bin
build		build
data		data
deepspeech-jni		deepspeech-jni
doc		doc
examples		examples
images		images
kenlm		kenlm
native_client		native_client
taskcluster		taskcluster
tensorflow		tensorflow
tests		tests
training/deepspeech_training		training/deepspeech_training
util		util
.cardboardlint.yml		.cardboardlint.yml
.compute		.compute
.gitattributes		.gitattributes
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pylintrc		.pylintrc
.readthedocs.yml		.readthedocs.yml
.taskcluster.yml		.taskcluster.yml
BIBLIOGRAPHY.md		BIBLIOGRAPHY.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CODE_OWNERS.rst		CODE_OWNERS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
DeepSpeech.py		DeepSpeech.py
Dockerfile.build.tmpl		Dockerfile.build.tmpl
Dockerfile.train.tmpl		Dockerfile.train.tmpl
GRAPH_VERSION		GRAPH_VERSION
ISSUE_TEMPLATE.md		ISSUE_TEMPLATE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASE.rst		RELEASE.rst
SUPPORT.rst		SUPPORT.rst
VERSION		VERSION
bazel.patch		bazel.patch
build-all.sh		build-all.sh
build-python-wheel.yml-DISABLED_ENABLE_ME_TO_REBUILD_DURING_PR		build-python-wheel.yml-DISABLED_ENABLE_ME_TO_REBUILD_DURING_PR
ds_generic.supp		ds_generic.supp
ds_lib.supp		ds_lib.supp
ds_openfst.supp		ds_openfst.supp
ds_sox.supp		ds_sox.supp
evaluate.py		evaluate.py
evaluate_tflite.py		evaluate_tflite.py
lm_optimizer.py		lm_optimizer.py
parse_valgrind_suppressions.sh		parse_valgrind_suppressions.sh
requirements_eval_tflite.txt		requirements_eval_tflite.txt
requirements_tests.txt		requirements_tests.txt
requirements_transcribe.txt		requirements_transcribe.txt
setup.py		setup.py
stats.py		stats.py
tensorflow_full_runtime.supp		tensorflow_full_runtime.supp
tensorflow_tflite_runtime.supp		tensorflow_tflite_runtime.supp
transcribe.py		transcribe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepSpeech

How to build project for target Android

prerequisites

Fetch source codes

Pre-Build

Build

Post-Build

Run Android example on real Android phone

Support

Contribution

whisper.cpp

Acknowledgement

License

Status/Defect

About

Releases

Packages

Contributors 124

Languages

License

zhouwg/DeepSpeech

Folders and files

Latest commit

History

Repository files navigation

DeepSpeech

How to build project for target Android

prerequisites

Fetch source codes

Pre-Build

Build

Post-Build

Run Android example on real Android phone

Support

Contribution

whisper.cpp

Acknowledgement

License

Status/Defect

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 124

Languages

Packages