The SSIX corpora: three gold standard corpora for sentiment analysis in English, Spanish and German financial microblogs

View/ Open
Date
2018-05-07Author
Gaillat, Thomas
Zarrouk, Manel
Freitas, André
Davis, Brian
Metadata
Show full item recordUsage
This item's downloads: 76 (view details)
Recommended Citation
Gaillat, Thomas, Zarrouk, Manel, Freitas, André, & Davis, Brian. (2018). The SSIX corpora: three gold standard corpora for sentiment analysis in English, Spanish and German financial microblogs. Paper presented at the 11th edition of the Language Resources and Evaluation Conference (LREC 2018), Miyazaki, Japan, 7-12 May.
Published Version
Abstract
This paper introduces the three SSIX corpora for sentiment analysis. These corpora address the need to provide annotated data for
supervised learning methods. They focus on stock-market related messages extracted from two financial microblog platforms, i.e.,
StockTwits and Twitter. In total they include 2,886 messages with opinion targets. These messages are provided with polarity annotation
set on a continuous scale by three or four experts in each language. The annotation information identifies the targets with a sentiment
score. The annotation process includes manual annotation verified and consolidated by financial experts. The creation of the annotated
corpora took into account principled sampling strategies as well as inter-annotator agreement before consolidation in order to maximize
data quality.