CoFiF: A corpus of financial reports in French language

View/ Open
Date
2019-08-12Author
Ahmadi, Sina
Daudert, Tobias
Metadata
Show full item recordUsage
This item's downloads: 119 (view details)
Recommended Citation
Ahmadi, Sina, & Daudert, Tobias. (2019). CoFiF: A corpus of financial reports in French language. Paper presented at the The First Workshop on Financial Technology and Natural Language Processing (FinNLP), Macao, China, 12 August, https://doi.org/10.13025/zjf2-fn10
Published Version
Abstract
In an era when machine learning and artificial intelligence have huge momentum, the data demand to train and test models is steadily growing. We introduce CoFiF, the first corpus comprising company reports in the French language. It contains over 188 million tokens in 2655 reports, covering reference documents, annual, semestrial and trimestrial reports. Our main focus is on the 60 largest French companies listed in France s main stock indices CAC40 and CAC Next 20. The corpus spans over 20 years, ranging from 1995 to 2018. To evaluate this novel collection of organizational writing, we use CoFiF to generate two character-level language models, a forward and a backward one, which we use to demonstrate the corpus potential on business, economics, and management research in the French language.