News Commentary Parallel Corpus v11 (2016)

This is the home of the News Commentary Parallel Corpus. The corpus was created as training data resource for the Conference for Statistical Machine Translation Evaluation Campaign and consists on political and economic commentary crawled from the web site Project Syndicate.

The data is provided as is. No claims of intellectual property are made on the work of preparation of the corpus.

The corpus is provided in XLIFF format, to be used as a training data for the CASMACAT Home Edition.

The corpus also available in the following formats:

XLIFF Files


cs-ar (147682 Segments)

cs-en (191432 Segments)

cs-zh (190809 Segments)

de-ar (194006 Segments)

de-cs (185579 Segments)

de-en (242770 Segments)

de-fr (200497 Segments)

de-nl (24538 Segments)

de-zh (243655 Segments)

en-ar (233239 Segments)

en-nl (22998 Segments)

en-zh (288488 Segments)

es-ar (215693 Segments)

es-cs (183267 Segments)

es-de (227566 Segments)

es-en (260059 Segments)

es-fr (212149 Segments)

es-it (46895 Segments)

es-ja (1666 Segments)

es-nl (24223 Segments)

es-pt (28245 Segments)

es-ru (193866 Segments)

es-fr (212149 Segments)

es-it (46895 Segments)

es-ja (1666 Segments)

es-nl (24223 Segments)

es-pt (28245 Segments)

es-ru (193866 Segments)

es-zh (271863 Segments)

fr-ar (194887 Segments)

fr-cs (159933 Segments)

fr-en (228946 Segments)

fr-nl (23902 Segments)

fr-zh (244832 Segments)

it-ar (49088 Segments)

it-cs (34909 Segments)

it-de (43759 Segments)

it-en (45794 Segments)

it-fr (43436 Segments)

it-nl (18076 Segments)

it-zh (60741 Segments)

ja-ar (1539 Segments)

ja-cs (1734 Segments)

ja-de (1613 Segments)

ja-en (1788 Segments)

ja-fr (1458 Segments)

ja-zh (2321 Segments)

nl-ar (26602 Segments)

nl-cs (19983 Segments)

nl-zh (35335 Segments)

pt-ar (31901 Segments)

pt-cs (20655 Segments)

pt-de (24348 Segments)

pt-en (29210 Segments)

pt-fr (28111 Segments)

pt-it (13055 Segments)

pt-nl (12300 Segments)

pt-zh (44040 Segments)

ru-ar (164835 Segments)

ru-cs (170597 Segments)

ru-de (189833 Segments)

ru-en (196245 Segments)

ru-fr (173602 Segments)

ru-it (31824 Segments)