An efficient regular expression inference approach for relevant image extraction

dc.authorid0000-0003-4351-2244
dc.contributor.authorAgun, Hayri Volkan
dc.contributor.authorUzun, Erdinc
dc.date.accessioned2026-02-12T21:05:33Z
dc.date.available2026-02-12T21:05:33Z
dc.date.issued2023
dc.departmentBursa Teknik Üniversitesi
dc.description.abstractTraditional approaches for extracting relevant images automatically from web pages are error-prone and time-consuming. To improve this task, operations such as preparing a larger dataset and finding new features are used in the web data extraction approaches. However, these operations are difficult and laborious. In this study, we propose a fully-automated approach based on alignment of regular ex-pressions to automatically extract the relevant images from web pages. The automatically constructed regular expressions has been applied to a classification task for the first time. In this respect, a multi-stage inference approach is developed for generating regular expressions from the attribute values of relevant and irrelevant image elements in web pages. The proposed approach reduces the complexity of the alignment of two regular expressions by applying a constraint on a version of the Levenshtein distance algorithm. The classification accuracy of regular expression approaches is compared with the naive Bayes, logistic regression, J48, and multilayer perceptron classifiers on a balanced relevant image retrieval dataset consisting of 360 image element samples for 10 shopping websites. According to the cross-validation results, the regular expression inference-based classification achieved a 0.98 f-measure with only 5 frequent n-grams, and it outperformed other classifiers on the same set of features. The classification efficiency of the proposed approach is measured at 0.108 ms, which is very competitive with other classifiers.(c) 2023 Elsevier B.V. All rights reserved.
dc.identifier.doi10.1016/j.asoc.2023.110030
dc.identifier.issn1568-4946
dc.identifier.issn1872-9681
dc.identifier.scopus2-s2.0-85149807859
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.1016/j.asoc.2023.110030
dc.identifier.urihttps://hdl.handle.net/20.500.12885/7029
dc.identifier.volume135
dc.identifier.wosWOS:000967879100001
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherElsevier
dc.relation.ispartofApplied Soft Computing
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_WoS_20260212
dc.subjectWeb image extraction
dc.subjectRegular expression inference
dc.subjectFeature extraction
dc.subjectText classification
dc.titleAn efficient regular expression inference approach for relevant image extraction
dc.typeArticle

Dosyalar