so-vits-svc 4.0: Colab Flow

Reference

【so-vits-svc】手把手教你老婆唱歌 https://www.bilibili.com/video/BV1vM4y1S7zB/arrow-up-right

soVITS3.0数据集准备 https://www.bilibili.com/read/cv20514221arrow-up-right

https://github.com/innnky/so-vits-svc/tree/4.0arrow-up-right

Colab Notebook

1. Preparing Training Data

Dataset Requirements

  • 60-100 slices of audio, in .wav format

  • each slice should be around 4-8 seconds

  • Sample rate should be 44100 Hz

Training Data

This radio story series of Majo no Tabitabi is a decent training dataset:

https://www.bilibili.com/video/BV1d54y1m7cK/arrow-up-right

To download the audio, use JiJiDownarrow-up-right.

Data Preparation

We need to assemble the audio pieces together with Adobe Pr Pro

At the beginning and end of each episode, the volume of the background music is raised. For the best quality, we will trim those sections out.

Also, we can observe that the BGM volume never raise above -30 dB. We will since use this value as the noise floor later on.

Audio export settings:

Then, we need to slice the data into pieces. To do that, we can use openvpi/audio-slicerarrow-up-right.

Open a few audio files to verify that the audio has been sliced correctly.

We can also enable the "Length" property in the file browser to get a sense of the duration of each slice. To do so, select Sort by -> More... and click Length in the right-click menu.

Finally, zip the dataset.

2. Setting up the training environment

Last updated