Basit öğe kaydını göster

dc.contributor.authorOnan, A
dc.date.accessioned2020-07-01T08:28:17Z
dc.date.available2020-07-01T08:28:17Z
dc.date.issuedAPR
dc.date.issued2016
dc.identifier.urihttp://hdl.handle.net/20.500.12481/5950
dc.description.abstractWeb page classification is an important research direction on web mining. The abundant amount of data available on the web makes it essential to develop efficient and robust models for web mining tasks. Web page classification is the process of assigning a web page to a particular predefined category based on labelled data. It serves for several other web mining tasks, such as focused web crawling, web link analysis and contextual advertising. Machine learning and data mining methods have been successfully applied for several web mining tasks, including web page classification. Multiple classifier systems are a promising research direction in machine learning, which aims to combine several classifiers by differentiating base classifiers and/or dataset distributions so that more robust classification models can be built. This paper presents a comparative analysis of four different feature selections (correlation, consistency, information gain and chi-square-based feature selection) and four different ensemble learning methods (Boosting, Bagging, Dagging and Random Subspace) based on four different base learners (naive Bayes, K-nearest neighbour algorithm, C4.5 algorithm and FURIA algorithm). The article examines the predictive performance of ensemble methods for web page classification. The experimental results indicate that feature selection and ensemble learning can enhance the predictive performance of classifiers in web page classification. For the DMOZ-50 dataset, the highest average predictive performance (88.1%) is obtained with the combination of consistency-based feature selection with AdaBoost and naive Bayes algorithms, which is a promising result for web page classification. Experimental results indicate that Bagging and Random Subspace ensemble methods and correlation-based and consistency-based feature selection methods obtain better results in terms of accuracy rates.
dc.titleClassifier and feature set ensembles for web page classification
dc.title.alternativeJOURNAL OF INFORMATION SCIENCE
dc.identifier.DOI-ID10.1177/0165551515591724
dc.identifier.volume42
dc.identifier.issue2
dc.identifier.startpage150
dc.identifier.endpage165
dc.identifier.issn/e-issn0165-5515
dc.identifier.issn/e-issn1741-6485


Bu öğenin dosyaları:

DosyalarBoyutBiçimGöster

Bu öğe ile ilişkili dosya yok.

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster