Enhancing News Tweets Classification Through Pre-Processing Techniques
Abstract
Today in the era of technology, social media platforms have reshaped the dissemination of news Twitter emerged as a main source for real-time news updates. As a large number of Twitter news is generated every second there is a need for a system that accurately classification of news content for better real-time media monitoring. In this research, a machine learning based approach to enhance the classification of news tweets through preprocessing techniques is introduced. A combination of different preprocessing is implemented on Wall Street Journal twitter news tweets. This preprocessing especially design for twitter includes removing URLs, removing mentions, removing emoticons along with basic text preprocessing. The pre-processed text corpus is evaluated with different machine-learning models. Support Vector Machine (SVM) outperforms others with an accuracy of 95%.
Keywords: Wall Street Journal; Tokenization; Vectorization; Machine Learning Models; Deep Learning Models; Text Classification; Twitter; Preprocessing; Data Mining; Feature Engineering