UTILIZING DEEP LEARNING AND LINGUISTIC EMBEDDINGS FOR TWITTER BOT IDENTIFICATION

Roha Ishfaq; Muhammad Kamran Abid; Muhammad Fuzail; Talha Farooq Khan; Ahmad Naeem; Naeem Aslam

Authors

Roha Ishfaq
Muhammad Kamran Abid
Muhammad Fuzail
Talha Farooq Khan
Ahmad Naeem
Naeem Aslam

Abstract

The usage of social media sites, such as Twitter, has become a key instrument in the process of communication, however with such popular activities came the proliferation of use of automated user accounts, or bots, which may facilitate the dissemination of misinformation, the manipulation of popular opinion, and the disruption in online conversations. Manual feature-based and topological bot detection approaches have grown out of date as the behavior of bots approximates more closely to human behavior. The study will discuss this issue and suggest a new method of Twitter bot detection based on deep learning models with linguistic embeddings, including BERT and BiGRU. These models employ the use of contextual embeddings, to derive meaning automatically out of tweet data, and are not subjected to the manual feature-engineering approach. This study has shown the results, the deep learning model, especially, BERT, is significantly outperforming traditional models, such as LSTM and CNN, as it can be more accurately classified with a minimal number of false positives and false negatives. The feature importance analysis also enhances the model since it isolates the most influential feature including the length of tweet, use of link, and frequency of the hashtag. This increases the clarity and definiteness of the model hence making it an effective bot-detecting tool. The novel contribution of the paper is that it uses contextual embeddings, adding nuances to the model that allows it to capture more linguistic complexity that is not always capturable with more conventional approaches. The paper has, however, limitations as it follows labeled data and also lacks ability to make the model available in real time sending. The potential directions of future research are real-time detection of bots and defending other social media since the model could be extended to include other platforms, improving its use and coverage in combating online manipulation.

Keywords:

Social media; Twitter; Bot detection; Feature extraction; Feature selection; Machine learning; Deep learning; Attention mechanism; Natural Language Processing