Spaces:
				
			
			
	
			
			
					
		Running
		
	
	
	
			
			
	
	
	
	
		
		Datasets Format
Amphion support the following academic datasets (sort alphabetically):
The downloading link and the file structure tree of each dataset is displayed as follows.
AudioCaps
AudioCaps is a dataset of around 44K audio-caption pairs, where each audio clip corresponds to a caption with rich semantic information. You can download the dataset here. The file structure tree is like:
[AudioCaps dataset path]
β£ AudioCpas
β   β£ wav
β   β   β£ ---1_cCGK4M_0_10000.wav
β   β   β£ ---lTs1dxhU_30000_40000.wav
β   β   β£ ...
CSD
The official CSD dataset can be download here. The file structure tree is like:
[CSD dataset path]
 β£ english
 β£ korean
 β£ utterances
 β β£ en001a
 β β β£ {UtterenceID}.wav
 β β£ en001b
 β β£ en002a
 β β£ en002b
 β β£ ...
 β£ README
KiSing
The official KiSing dataset can be download here. The file structure tree is like:
[KiSing dataset path]
 β£ clean
 β β£ 421
 β β£ 422
 β β£ ...
LibriTTS
The official LibriTTS dataset can be download here. The file structure tree is like:
[LibriTTS dataset path]
 β£ BOOKS.txt
 β£ CHAPTERS.txt
 β£ eval_sentences10.tsv
 β£ LICENSE.txt
 β£ NOTE.txt
 β£ reader_book.tsv
 β£ README_librispeech.txt
 β£ README_libritts.txt 
 β£ speakers.tsv
 β£ SPEAKERS.txt
 β£ dev-clean (Subset)
 β β£ 1272{Speaker_ID}
 β β β£ 128104 {Chapter_ID}
 β β β β£ 1272_128104_000001_000000.normalized.txt
 β β β β£ 1272_128104_000001_000000.original.txt
 β β β β£ 1272_128104_000001_000000.wav
 β β β β£ ...
 β β β β£ 1272_128104.book.tsv
 β β β β£ 1272_128104.trans.tsv
 β β β£ ...
 β β£ ...
 β£ dev-other (Subset)
 β β£ 116 (Speaker)
 β β β£ 288045 {Chapter_ID}
 β β β β£ 116_288045_000003_000000.normalized.txt
 β β β β£ 116_288045_000003_000000.original.txt
 β β β β£ 116_288045_000003_000000.wav
 β β β β£ ...
 β β β β£ 116_288045.book.tsv
 β β β β£ 116_288045.trans.tsv
 β β β£ ...
 β β£ ...
 β β£ ...
 β£ test-clean  (Subset)
 β β£ {Speaker_ID}
 β β β£ {Chapter_ID}
 β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
 β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
 β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
 β β β β£ ...
 β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv
 β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv
 β β β£ ...
 β β£ ...
 β£ test-other
 β β£ {Speaker_ID}
 β β β£ {Chapter_ID}
 β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
 β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
 β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
 β β β β£ ...
 β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv
 β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv
 β β β£ ...
 β β£ ...
 β£ train-clean-100
 β β£ {Speaker_ID}
 β β β£ {Chapter_ID}
 β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
 β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
 β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
 β β β β£ ...
 β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv
 β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv
 β β β£ ...
 β β£ ...
 β£ train-clean-360
 β β£ {Speaker_ID}
 β β β£ {Chapter_ID}
 β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
 β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
 β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
 β β β β£ ...
 β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv
 β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv
 β β β£ ...
 β β£ ...
 β£ train-other-500
 β β£ {Speaker_ID}
 β β β£ {Chapter_ID}
 β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
 β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
 β β β β£ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
 β β β β£ ...
 β β β β£ {Speaker_ID}_{Chapter_ID}.book.tsv
 β β β β£ {Speaker_ID}_{Chapter_ID}.trans.tsv
 β β β£ ...
 β β£ ...
LJSpeech
The official LibriTTS dataset can be download here. The file structure tree is like:
[LJSpeech dataset path]
 β£ metadata.csv
 β£ wavs
 β β£ LJ001-0001.wav
 β β£ LJ001-0002.wav 
 β β£ ...
 β£ README
M4Singer
The official M4Singer dataset can be downloaded here. The file structure tree is like:
[M4Singer dataset path]
 β£ {Singer_1}#{Song_1}
 β β£ 0000.mid
 β β£ 0000.TextGrid
 β β£ 0000.wav
 β β£ ...
 β£ {Singer_1}#{Song_2}
 β£ ...
 β£ {Singer_2}#{Song_1}
 β£ {Singer_2}#{Song_2}
 β£ ...
 β meta.json
NUS-48E
The official NUS-48E dataset can be download here. The file structure tree is like:
[NUS-48E dataset path]
 β£ {SpeakerID}
 β β£ read
 β β β£ {SongID}.txt
 β β β£ {SongID}.wav
 β β β£ ...
 β β£ sing
 β β β£ {SongID}.txt
 β β β£ {SongID}.wav
 β β β£ ...
 β£ ...
 β£ README.txt
Opencpop
The official Opera dataset can be downloaded here. The file structure tree is like:
[Opencpop dataset path]
 β£ midis
 β β£ 2001.midi
 β β£ 2002.midi
 β β£ 2003.midi
 β β£ ...
 β£ segments
 β β£ wavs
 β β β£ 2001000001.wav
 β β β£ 2001000002.wav
 β β β£ 2001000003.wav
 β β β£ ...
 β β£ test.txt
 β β£ train.txt
 β β transcriptions.txt
 β£ textgrids
 β β£ 2001.TextGrid
 β β£ 2002.TextGrid
 β β£ 2003.TextGrid
 β β£ ...
 β£ wavs
 β β£ 2001.wav
 β β£ 2002.wav
 β β£ 2003.wav
 β β£ ...
 β£ TERMS_OF_ACCESS
 β readme.md
OpenSinger
The official OpenSinger dataset can be downloaded here. The file structure tree is like:
[OpenSinger dataset path]
 β£ ManRaw
 β β£ {Singer_1}_{Song_1}
 β β β£ {Singer_1}_{Song_1}_0.lab
 β β β£ {Singer_1}_{Song_1}_0.txt
 β β β£ {Singer_1}_{Song_1}_0.wav
 β β β£ ...
 β β£ {Singer_1}_{Song_2}
 β β£ ...
 β£ WomanRaw
 β£ LICENSE
 β README.md
Opera
The official Opera dataset can be downloaded here. The file structure tree is like:
[Opera dataset path]
 β£ monophonic
 β β£ chinese
 β β β£ {Gender}_{SingerID}
 β β β β£ {Emotion}_{SongID}.wav
 β β β β£ ...
 β β β£ ...
 β β£ western
 β£ polyphonic
 β β£ chinese
 β β£ western
 β£ CrossculturalDataSet.xlsx
PopBuTFy
The official PopBuTFy dataset can be downloaded here. The file structure tree is like:
[PopBuTFy dataset path]
 β£ data
 β β£ {SingerID}#singing#{SongName}_Amateur
 β β β£ {SingerID}#singing#{SongName}_Amateur_{UtteranceID}.mp3
 β β β£ ...
 β β£ {SingerID}#singing#{SongName}_Professional
 β β β£ {SingerID}#singing#{SongName}_Professional_{UtteranceID}.mp3
 β β β£ ...
 β£ text_labels
 β TERMS_OF_ACCESS
PopCS
The official PopCS dataset can be downloaded here. The file structure tree is like:
[PopCS dataset path]
 β£ popcs
 β β£ popcs-{SongName}
 β β β£ {UtteranceID}_ph.txt
 β β β£ {UtteranceID}_wf0.wav
 β β β£ {UtteranceID}.TextGrid
 β β β£ {UtteranceID}.txt
 β β β£ ...
 β β£ ...
 β TERMS_OF_ACCESS
PJS
The official PJS dataset can be downloaded here. The file structure tree is like:
[PJS dataset path]
 β£ PJS_corpus_ver1.1
 β β£ background_noise
 β β£ pjs{SongID}
 β β β£ pjs{SongID}_song.wav
 β β β£ pjs{SongID}_speech.wav
 β β β£ pjs{SongID}.lab
 β β β£ pjs{SongID}.mid
 β β β£ pjs{SongID}.musicxml
 β β β£ pjs{SongID}.txt
 β β£ ...
SVCC
The official SVCC dataset can be downloaded here. The file structure tree is like:
[SVCC dataset path]
 β£ Data
 β β£ CDF1
 β β β£ 10001.wav
 β β β£ 10002.wav
 β β β£ ...
 β β£ CDM1
 β β£ IDF1
 β β£ IDM1
 β README.md
VCTK
The official VCTK dataset can be downloaded here. The file structure tree is like:
[VCTK dataset path]
 β£ txt
 β β£ {Speaker_1}
 β β β£ {Speaker_1}_001.txt
 β β β£ {Speaker_1}_002.txt
 β β β£ ...
 β β£ {Speaker_2}
 β β£ ...
 β£ wav48_silence_trimmed
 β β£ {Speaker_1}
 β β β£ {Speaker_1}_001_mic1.flac
 β β β£ {Speaker_1}_001_mic2.flac
 β β β£ {Speaker_1}_002_mic1.flac
 β β β£ ...
 β β£ {Speaker_2}
 β β£ ...
 β£ speaker-info.txt
 β update.txt
