Show simple item record

dc.contributor.authorDaudert, Tobias
dc.contributor.authorAhmadi, Sina
dc.date.accessioned2019-07-19T11:02:10Z
dc.date.issued2019-08-12
dc.identifier.citationDaudert, Tobias, & Ahmadi, Sina. (2019). NUIG at the FinSBD Task: Sentence boundary detection for noisy financial PDFs in English and French. Paper presented at the First Workshop on Financial Technology an Natural language Processing (FinNLP@IJCAI2019), Macao, China, 12 August, https://doi.org/10.13025/yzq2-dr94en_IE
dc.identifier.urihttp://hdl.handle.net/10379/15277
dc.description.abstractPortable Document Format (PDF) has become the industry-standard document as it is independent of the software, hardware or operating system. Publicly listed companies annually publish a variety of reports and too take advantage of PDF. This leads to the rise in PDF containing valuable financial information and the demand for approaches able to accurately extract this data. Analyzing and mining information requires a challenging extraction phase, particularly with respect to document structure. In this paper, we describe a sentence bound- ary detection approach capable of extracting complete sentences from unstructured lists of tokens. Our approach is based on the application of a language model and sequence classifier for both the English and the French language. The results show a good performance, achieving F1 scores of 0.855 and 0.91, and placed our team in 3rd and 5th for the French and English language, respectively.en_IE
dc.formatapplication/pdfen_IE
dc.language.isoenen_IE
dc.publisherNUI Galwayen_IE
dc.relation.ispartofFinSBD-2019 Shared Task Sentence Boundary Detection in PDF Noisy Text in the Financial Domainen
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Ireland
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/3.0/ie/
dc.subjectPortable Document Format (PDF)en_IE
dc.subjectSentence boundary detectionen_IE
dc.subjectEnglishen_IE
dc.subjectFrenchen_IE
dc.subjectPDFen_IE
dc.subjectNoisy Texten_IE
dc.subjectFinancial Domainen_IE
dc.titleNUIG at the FinSBD Task: Sentence boundary detection for noisy financial PDFs in English and Frenchen_IE
dc.typeWorkshop paperen_IE
dc.date.updated2019-07-13T19:39:02Z
dc.identifier.doi10.13025/yzq2-dr94
dc.local.publishedsourcehttps://doi.org/10.13025/yzq2-dr94
dc.description.peer-reviewedpeer-reviewed
dc.contributor.funderEuropean Regional Development Funden_IE
dc.contributor.funderScience Foundation Irelanden_IE
dc.description.embargo2019-08-12
dc.internal.rssid16784805
dc.local.contactSina Ahmadi, The Insight Centre For Data Analytics, National University Of Ireland, Galway , The Deri Building . Email: s.ahmadi1@nuigalway.ie
dc.local.copyrightcheckedYes
dc.local.versionPUBLISHED
dcterms.projectinfo:eu-repo/grantAgreement/SFI/SFI Research Centres/12/RC/2289/IE/INSIGHT - Irelands Big Data and Analytics Research Centre/en_IE
nui.item.downloads150


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Ireland
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland