Newsroom Article Summaries Dataset Released


In Spring 2018, the Cx Oath lab released the largest dataset of article summaries to date. The Newsroom dataset consists of 1.3 million article summaries, and was designed for training and evaluation of automatic summarization systems. The dataset contains summaries written in the newsrooms of 38 major publications between 1998 and 2017 that show a wide variety of summarization styles. The dataset is available along with tools to explore the data, compare summarization styles across publications and time, and evaluate the performance of existing state-of-the-art automatic summarization systems.


The Newsroom dataset and its accompanying paper were presented at the 2018 conference of the North American Association for Computational Linguistics by authors Max Grusky, Mor Naaman, and Yoav Artzi.

4 views

Recent Posts

See All

Detecting Propaganda in News

Cx PhD student Yiqing Hua has been working on using natural language processing techniques to detect propaganda in news. As a participant in the 2nd Workshop on NLP for Internet Freedom (NLP4IF), Y

GET IN TOUCH

© 2019 The Connected Experiences Lab. 

This site was designed with the
.com
website builder. Create your website today.
Start Now