dc.contributor.author | Daudert, Tobias | |
dc.contributor.author | Ahmadi, Sina | |
dc.date.accessioned | 2019-07-19T11:02:10Z | |
dc.date.issued | 2019-08-12 | |
dc.identifier.citation | Daudert, Tobias, & Ahmadi, Sina. (2019). NUIG at the FinSBD Task: Sentence boundary detection for noisy financial PDFs in English and French. Paper presented at the First Workshop on Financial Technology an Natural language Processing (FinNLP@IJCAI2019), Macao, China, 12 August, https://doi.org/10.13025/yzq2-dr94 | en_IE |
dc.identifier.uri | http://hdl.handle.net/10379/15277 | |
dc.description.abstract | Portable Document Format (PDF) has become the industry-standard document as it is independent of the software, hardware or operating system. Publicly listed companies annually publish a variety of reports and too take advantage of PDF. This leads to the rise in PDF containing valuable financial information and the demand for approaches able to accurately extract this data. Analyzing and mining information requires a challenging extraction phase, particularly with respect to document structure. In this paper, we describe a sentence bound- ary detection approach capable of extracting complete sentences from unstructured lists of tokens. Our approach is based on the application of a language model and sequence classifier for both the English and the French language. The results show a good performance, achieving F1 scores of 0.855 and 0.91, and placed our team in 3rd and 5th for the French and English language, respectively. | en_IE |
dc.format | application/pdf | en_IE |
dc.language.iso | en | en_IE |
dc.publisher | NUI Galway | en_IE |
dc.relation.ispartof | FinSBD-2019 Shared Task Sentence Boundary Detection in PDF Noisy Text in the Financial Domain | en |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 Ireland | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/3.0/ie/ | |
dc.subject | Portable Document Format (PDF) | en_IE |
dc.subject | Sentence boundary detection | en_IE |
dc.subject | English | en_IE |
dc.subject | French | en_IE |
dc.subject | PDF | en_IE |
dc.subject | Noisy Text | en_IE |
dc.subject | Financial Domain | en_IE |
dc.title | NUIG at the FinSBD Task: Sentence boundary detection for noisy financial PDFs in English and French | en_IE |
dc.type | Workshop paper | en_IE |
dc.date.updated | 2019-07-13T19:39:02Z | |
dc.identifier.doi | 10.13025/yzq2-dr94 | |
dc.local.publishedsource | https://doi.org/10.13025/yzq2-dr94 | |
dc.description.peer-reviewed | peer-reviewed | |
dc.contributor.funder | European Regional Development Fund | en_IE |
dc.contributor.funder | Science Foundation Ireland | en_IE |
dc.description.embargo | 2019-08-12 | |
dc.internal.rssid | 16784805 | |
dc.local.contact | Sina Ahmadi, The Insight Centre For Data Analytics, National University Of Ireland, Galway , The Deri Building . Email: s.ahmadi1@nuigalway.ie | |
dc.local.copyrightchecked | Yes | |
dc.local.version | PUBLISHED | |
dcterms.project | info:eu-repo/grantAgreement/SFI/SFI Research Centres/12/RC/2289/IE/INSIGHT - Irelands Big Data and Analytics Research Centre/ | en_IE |
nui.item.downloads | 150 | |