Satire identification in Turkish news articles based on ensemble of classifiers
Abstract
Social media and microblogging platforms generally contain elements of figurative and nonliteral language,including satire. The identification of figurative language is a fundamental task for sentiment analysis. It will not bepossible to obtain sentiment analysis methods with high classification accuracy if elements of figurative language havenot been properly identified. Satirical text is a kind of figurative language, in which irony and humor have been utilizedto ridicule or criticize an event or entity. Satirical news is a pervasive issue on social media platforms, which can bedeceptive and harmful. This paper presents an ensemble scheme for satirical news identification in Turkish news articles.In the presented scheme, linguistic and psychological feature sets have been utilized to extract the feature sets (i.e.linguistic, psychological, personal, spoken categories, and punctuation). In the classification phase, accuracy rates offive supervised learning algorithms (i.e. naive Bayes algorithm, logistic regression, support vector machines, randomforest, and k-nearest neighbor algorithm) with three widely utilized ensemble methods (i.e. AdaBoost, bagging, andrandom subspace) have been considered. Based on the results, we concluded that the random forest algorithm yieldedthe highest performance, with a classification accuracy of 96.92% for satire detection in Turkish. For deep learning-basedarchitectures, we have achieved classification accuracy of 97.72% with the recurrent neural network architecture withattention mechanism.
Collections
- TR - Dizin [3877]