10.4121/uuid:61fb9665-40ab-4b70-8214-767c521cc950

URL

Metadata

Title and subtitles of Wikipedia articles Dataset

David Sanchez-Charles,
This dataset contains 871 articles from Wikipedia (retrieved on 8th August 2016), selected from the list of featured articles ({https://en.wikipedia.org/wiki/Wikipedia:Featured_articles}) of the 'Media', 'Literature and Theater', 'Music biographies', 'Media biographies', 'History biographies' and 'Video gaming' categories. From the list of articles, the structure of the document, i.e. sections and subsections of the text, is extracted. The dataset also contains a proposed clusterization of the event names to increase comparability of Wikipedia articles.

Citation