TED Talk Parallel Corpus

This is a provisional home for the XLIFF version of the TED Talk Parallel Corpus. The corpus was created as training data resource for the International Workshop on Spoken Language Translation 2013 and consists of volunteer transcriptions and translations from the TED web site. Crawling and preparation of the corpus was carried out by Mauro Cettolo from the Fundazione Bruno Kessler (FBK).

The corpus was coverted from the original release into XLIFF format, to be used as a training data for the CASMACAT Home Edition.

XLIFF Files

en-ar (155047 Segments)
en-de (145947 Segments)
en-es (160185 Segments)
en-fa (81872 Segments)
en-fr (162681 Segments)
en-it (161701 Segments)
en-nl (148117 Segments)
en-pl (151288 Segments)
en-pt (158251 Segments)
en-ro (160776 Segments)
en-ru (135669 Segments)
en-sl (15231 Segments)
en-tr (139045 Segments)
en-zh (156811 Segments)