Skip to content

Releases: fishaudio/fish-diffusion

v2.2.1

22 May 20:20
b317ead
Compare
Choose a tag to compare

In this version, we added a uniform denoiser (shallow diffusion) to further increase the generation quality of both diffusion trained on bad datasets and HiFiSinger. This is the latest stable version of V2.

V2.1.0

26 Mar 23:00
d259486
Compare
Choose a tag to compare

我们在此版本增加了 HiFiSinger 架构支持 (见 configs/svc_hifisinger.py), 它有以下优势:
In this version, we added HiFiSinger architecture support (see configs/svc_hifisinger.py) with the following advantages:

  • 推理速度远远快于 DiffSVC.
  • The inference speed is much faster than DiffSVC.
  • 在脏样本情况下表现更好 (但是极限性能不如 DiffSVC).
  • It performs better under noisy sample conditions (although the ultimate performance is not as good as DiffSVC).

同时, 我们在此架构中加入了响度嵌入, 提高了模型的表现力.
At the same time, we added loudness (power) embedding to this architecture, enhancing the model's expressiveness.

2023-03-29 更新:
2023-03-29 Update:

我们新增了音色混合功能, 现有模型均可使用, 只需在推理命令中加入:
We added the timbre (or speaker) mixing function. You only need to update your inference command to:

--speaker "speaker_a:0.5,speaker_b:0.5"

我们发布了一个 HiFiSinger + Content Vec 的预训练模型
We released a HiFiSinger + Content Vec pre-trained model.

我们强烈建议您参考随附的配置进行微调.
We strongly recommend that you refer to the config I've attached for finetuning.

Model Info

  • Dataset Size: ~50 hours (M4Singer, OpenCpop, and In-House Data), 2.25x data aug
  • Feature Extractor: ContentVec
  • MD5: 45a84d1b626cbdb23f72042c7eac680f
  • Steps: 540k on a 2x3090 server

本模型根据 CC-BY-NC-SA 4.0 license 发布, 下载前请仔细阅读.
This model is released under CC-BY-NC-SA 4.0 license, please read it before you download.

V2.0.0

05 Mar 21:42
3201bd6
Compare
Choose a tag to compare

我们将在这个版本整合 so-vits-svc, 因此, 会有很多大改动.
We will integrate so-vits-svc in this version. Therefore, there are many breaking changes.

我们发布了一个 Content Vec 的预训练模型
We released a Content Vec pre-trained model.

我们强烈建议您参考随附的配置进行微调. 它应用了大量的新功能.
We strongly recommend you refer to the attached config for finetuning. It applied lots of new functions.

Model Info

  • Dataset Size: ~100 hours, ~100 singers (M4Singer, OpenSinger, OpenCpop, and In-House Data), 1.5x data aug
  • Vocoder: NSF HifiGAN 44.1 khz (OpenVPI)
  • Feature Extractor: ContentVec
  • MD5: 64034133bdf05910210f2f08cbda65c6
  • Steps: 300k on a 2x3090 server

2023-03-17 更新:

我们完成了 FishAudio 稳定版声码器的训练和试验, 该声码器在 60-1200 Hz 表现良好. 经验证, 完全可以作为 OpenVPI 声码器的上位替代.
We finished the training and testing of FishAudio stable vocoder (based on NSF-HiFiGAN), and it works well between 60 to 1200 Hz. Now, it will be the replacement of the original OpenVPI NSF-HiFiGAN vocoder.

使用方法: 下载 nsf_hifigan-stable-v1.zip 解压到 checkpoints
How to use: Download and decompress nsf_hifigan-stable-v1.zip to checkpoints

为了更方便用户使用, 我们还增加了 OpenUTAU vocoder: nsf_hifigan-stable-v1.dsvocoder.
For convenience, we also attached an OpenUTAU vocoder: nsf_hifigan-stable-v1.dsvocoder.

本模型根据 CC-BY-NC-SA 4.0 license 发布, 下载前请仔细阅读.
This model is released under CC-BY-NC-SA 4.0 license, please read it before you download.

V1.12

28 Feb 19:58
a397be8
Compare
Choose a tag to compare

As many users have already tested the current version, it should be stable.
该版本已经过大量用户测试, 他应该已经稳定了.

We will modify the dataset structure and add data augmentation in the next version (which will lead to many changes).
我们将在下个版本优化数据集结构和添加数据增强 (可能还有完善 DiffSinger), 这有可能导致程序爆炸.

除此之外, 我们放出了一个测试版声码器, 配置文件和 OpenVPI NSF-HiFiGAN 一致. 它再高频和低频有更好的表现 (起码不会破音了).
Besides that, we released a beta vocoder, which has an identical config as OpenVPI's NSF-HiFiGAN. It has a higher performance on both high and low notes.

使用方法: 下载 nsf_hifigan-beta-v2-epoch-434.zip 解压到 checkpoints
How to use: Download and decompress nsf_hifigan-beta-v2-epoch-434.zip to checkpoints

注: 该测试声码器在 M4Singer 和 OpenCpop 上训练了大约三天 (双路 3090). 我们会在更优质的数据集上训练, 并发布一个正式版.
Note: This beta VOCODER was trained on M4Singer and OpenCpop for about 3 days; we will release a vocoder trained on in-house data later.

03-01 更新:
03-01 Update:

  • 我们把模型多训练了一天, 并且提供了 onnx 导出文件.
  • We trained the model for one extra day and provided the ONNX model.
  • 为了更方便用户使用, 我们还增加了 OpenUTAU vocoder: nsf_hifigan-beta-v2-epoch-434.dsvocoder.
  • For convenience, we also attached an OpenUTAU vocoder: nsf_hifigan-beta-v2-epoch-434.dsvocoder.

该声码器在 Attribution-NonCommercial-ShareAlike 4.0 International 协议下发布.
Pretrained vocoders are released under the Attribution-NonCommercial-ShareAlike 4.0 International license.

03-03 更新:
03-03 Update:

  • 我们将 ContentVec 中的 content-vec-best-legacy-500.pt 加入了附件, 方便用户下载.
  • For your convenience, we added ContentVec's content-vec-best-legacy-500.pt.

V1.11

10 Feb 04:22
ffb512c
Compare
Choose a tag to compare

我们很高兴地宣布预训练模型现已可用, 这意味着您只需要 30 分钟的音频数据和 15 分钟的微调时间 (在 3090 上) 就可以模拟你想要的音色.
We are happy to announce that the pre-trained model is now available, which means you only need 30 minutes of audio data and 15 minutes to fine-tune it (on 3090).

我们建议您参考随附的配置进行微调. 它更改了学习率调度程序和保存检查点之间的步骤间隔.
We recommend you refer to the attached config for finetuning. It changed the lr scheduler and steps between saving checkpoints.

Model Info

  • Dataset Size: ~300 hours, ~600 singers (M4Singer, OpenSinger, OpenCpop, and In House Data)
  • Vocoder: NSF HifiGAN 44.1 khz (OpenVPI)
  • Feature Extractor: Chinese Hubert Soft with gate size 25
  • MD5: 9d88f1bbca34053919ee1ea8bd780a9b
  • Steps: 260k on a 4 x RTXA6000 server

本模型根据 CC-BY-NC-SA 4.0 license 发布, 下载前请仔细阅读.
This model is released under CC-BY-NC-SA 4.0 license, please read it before you download.

V1.1

04 Feb 10:01
7177ff5
Compare
Choose a tag to compare

The training and inference pipeline for SVC is stable now.
当前 SVC 模式的训练和推理已经基本稳定.

V1.2 Beta 0

04 Feb 22:40
a9fa937
Compare
Choose a tag to compare
V1.2 Beta 0 Pre-release
Pre-release

发布中文 Aligned Whisper 模型, 获得更好的 SVC 咬字/扛口胡能力.
Release Aligned Whisper Model, To obtain better SVC performance

注意: 该模型在中文之外的语言效果并不理想
Notice: This model's accuracy is not good in languages other than Chinese

模型说明:
aligned-whisper-cn-25k-v1.ckpt

  • Base Model: Whisper Medium (~300M)
  • Aligned Embedding Dim: 256
  • Dataset: (Chinese) OpenCpop, OpenSinger, M4Singer
  • Trained on 2x3090 for 50 hours
  • MD5 checksum: 840dad46fadd2b1f8a324ef7209f9ee1

aligned-whisper-cn-40k-v1.1.ckpt

  • Trained for extra 30 hours
  • This model has better calibration on voice and phones
  • MD5 checksum: 90a6852d67b7dc01f9e8e0c86378ceef

The multilingual modal is not released yet.

该模型在 Attribution-NonCommercial-ShareAlike 4.0 International 协议下发布.
Pretrained models are released under the Attribution-NonCommercial-ShareAlike 4.0 International license.

V1.0

23 Jan 21:21
cdb94a5
Compare
Choose a tag to compare

Continuous Pitch and Chinese Hubert Soft is now generally available.
连续语调嵌入和中文 Hubert 已经开放.