politsAIkroonika

About

This is the code behind the politsAIkroonika project on Instagram and YouTube. With just a single command, you can produce a short video clip featuring a fictional crime news story in the style of a certain Estonian 90s TV show. The story, audio and video are all 100% AI-generated using various models. The Estonian text-to-speech model used for the news reporter's voice has been custom trained for maximum authenticity.

A brief overview of the process:

Generate story title, summary and script using OpenAI GPT-3.5 and GPT-4
Convert script to audio using Voice Cloning App by BenAAndrew
Generate video clips to illustrate the story using ModelScope Text-to-Video
Enhance the video using Topaz Video AI (optional but highly recommended - improve resolution and frame rate)
Merge video clips, audio and subtitles using ffmpeg
Upload to Google Drive (optional - for the convenience of sharing the clip)

Installation

Installing the package and its dependencies is a bit more involved than usual due to the need to install and configure the Voice Cloning App and Topaz Video AI. The following instructions are for Windows, but should be easily adaptable to Linux.

Prerequisites

Python 3.8 or 3.9
Poetry (tested with 1.6.1)
ffmpeg
NVIDIA GPU with at least 8 GB of VRAM (tested on a GTX 1070)
OpenAI account with API key (paid subscription required, but the cost is a few cents per episode)

Optional:

Topaz Video AI (tested with version 3.2.0)
Whilst this is paid software, it is currently the best available option for frame interpolation and upscaling. Open source options do exist (RIFE/CAIN/DAIN etc) but would require additional development to implement.

Installing the package

Clone the repository and install the dependencies using Poetry:

poetry install

Voice-Cloning-App

Voice Cloning App is used for the text-to-speech functionality and is executed under its own virtual environment. This is because it requires specific versions of various libraries that may conflict with the versions required by this package.

Follow the manual install instructions here, except install the requirements into a virtual environment under /Voice-Cloning-App/.venv:

cd Voice-Cloning-App
python -m venv .venv
.venv\Scripts\activate  # Or the Linux equivalent
pip install -r requirements.txt

Environment variables

The code requires several environment variables to be configured. You may choose to set these in your system environment variables, or in a .env file in the root of the repository. For the latter option, you need to install the Poetry dotenv plugin.

The following environment variables are required:

OPENAI_API_KEY - get from your OpenAI account (instructions here)
The ffmpeg executable must be in your PATH variable

If you are using Topaz Video AI, the following environment variables are also required:

TVAI_MODEL_DIR and TVAI_MODEL_DATA_DIR - set according to instructions here
TVAI_FFMPEG - set to the path of the ffmpeg executable in your Topaz Video AI installation (e.g. C:\Program Files\Topaz Labs LLC\Topaz Video AI\ffmpeg.exe)

If you are using Google Drive, the following environment variable is also required:

GOOGLE_DRIVE_FOLDER_ID - the ID of the Google Drive folder where the videos will be uploaded. This is a long string of letters and numbers that can be found in the URL of the folder in Google Drive.

Models

Once everything has been installed, you will need to download and place the below models in the correct directories (relative to the Voice-Cloning-App directory). If the directories do not exist, create them.

Voice model - download from here and place in data/models/reporter
Vocoder model - download from here, rename from g_02500000 to model.pt and place in data/hifigan/vctk
Vocoder model config file - download from here and place in data/hifigan/vctk
Alphabet file - copy from alphabets/Estonian.txt to data/languages/Estonian and rename to alphabet.txt

The text-to-video model is automatically downloaded by the code.

For Topaz Video AI, if you have a fresh install, you may need to run the GUI first to download the required models. Simply load a video file and process it using the same models that the code uses:

Apollo v8 (apo-8) - frame interpolation
Theia Fine Tune Detail v3 (thd-3) - upscaling

Usage

If everything has been installed correctly, you should be able to run the following command to generate a new episode:

poetry run python .\politsaikroonika\make_episode.py

The above command will generate a new episode using the default settings. You can customise the episode using various command line arguments. For example, to avoid the topics of animals, theft, stealing and robbery, and to include fireworks and "new year's celebration", you can run the following command:

poetry run python .\politsaikroonika\make_episode.py -v --interactive --avoid animals,theft,stealing,robbery --include fireworks --include "new year's celebration"

The -v flag is for verbose output, and the --interactive flag is for interactive mode, which will prompt you to confirm the generated text parts before proceeding, and lets you override them if you wish.

More information on the available command line arguments can be found by running:

poetry run python .\politsaikroonika\make_episode.py --help

Training

If you are interested in training your own text-to-speech model, you can follow the instructions in the Voice Cloning App repository. For reference, the training data used for the Estonian model included over 1000 sentences with a total duration of around 1.5 hours. The training took approximately 2 days on a GTX 1070.

To gather the training data, I processed all publicly available clips of the original TV show and extracted the audio track. Then, I transcribed the audio using tekstiks.ee (with a fair amount of manual corrections) and used split_audio.py and various scripts under scripts to split the audio into individual sentences. Background noise was removed using OpenVINO's noise-suppression-poconetlike-0001 model. Finally, the audio was upsampled using NU-Wave2.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
Voice-Cloning-App @ 0911337		Voice-Cloning-App @ 0911337
politsaikroonika		politsaikroonika
resources		resources
scripts		scripts
tests		tests
tts_preprocess_et @ fb8a151		tts_preprocess_et @ fb8a151
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

politsAIkroonika

About

Installation

Prerequisites

Installing the package

Voice-Cloning-App

Environment variables

Models

Usage

Training

About

Languages

License

sisalik/politsaikroonika

Folders and files

Latest commit

History

Repository files navigation

politsAIkroonika

About

Installation

Prerequisites

Installing the package

Voice-Cloning-App

Environment variables

Models

Usage

Training

About

Topics

Resources

License

Stars

Watchers

Forks

Languages