Show simple item record

dc.contributor.authorMuñoz, Emir
dc.contributor.authorCostabello, Luca
dc.contributor.authorVandenbussche, Pierre-Yves
dc.date.accessioned2016-09-15T08:50:57Z
dc.date.available2016-09-15T08:50:57Z
dc.date.issued2014
dc.identifier.citationMuñoz, Emir, Costabello, Luca, & Vandenbussche, Pierre-Yves. (2014). µRaptor: a DOM-based system with appetite for hCard elements. Paper presented at the Proceedings of the Second International Conference on Linked Data for Information Extraction - Volume 1267, Riva del Garda, Italy.en_IE
dc.identifier.urihttp://hdl.handle.net/10379/6021
dc.description.abstractThis paper describes µRaptor, a DOM-based method to extract hCard microformats from HTML pages stripped of microformat markup. µRaptor extracts DOM sub-trees, converts them into rules, and uses them to extract hCard microformats. Besides, we use co-occurring CSS classes to improve the overall precision. Results on train data show 0.96 precision and 0.83 F1 measure by considering only the most common tree patterns. Furthermore, we propose the adoption of additional constraint rules on the values of hCard elements to further improve the extraction.en_IE
dc.description.sponsorshipThis work has been supported by KI2NA project funded by Fujitsu Laboratories Limited and Insight Centre for Data Analytics at NUI Galway (formerly known as DERI Galway).en_IE
dc.formatapplication/pdfen_IE
dc.language.isoenen_IE
dc.publisherCEUR-WS.org
dc.relation.ispartofLD4IE@ISWCen
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Ireland
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/3.0/ie/
dc.subjectData analytics
dc.subjectµRaptor
dc.subjectDOM-based system
dc.subjecthCard
dc.titleµRaptor: A DOM-based system with appetite for hCard elementsen_IE
dc.typeWorkshop paperen_IE
dc.date.updated2016-09-13T13:30:49Z
dc.local.publishedsourcehttp://dl.acm.org/citation.cfm?id=2878575.2878583en_IE
dc.description.peer-reviewedpeer-reviewed
dc.contributor.funder|~|1267880|~|
dc.internal.rssid11398948
dc.local.contactEmir Munoz, Deri, Ida Business Park, Lower Dangan, Nui Galway. - Email: e.munoz1@nuigalway.ie
dc.local.copyrightcheckedYes
dc.local.versionACCEPTED
nui.item.downloads310


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Ireland
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland