12. Impact of Text Preprocessing Techniques on Fake News Detection
Conference: Women in Statistics and Data Science 2024
10/17/2024: 11:45 AM - 1:15 PM EDT
Speed
The journey from raw text to actionable insights in the realm of natural language processing involves several critical preprocessing stages. These stages prepare the textual data for further analysis by implementing strategies such as eliminating infrequently occurring words, removing stopwords, removing numerical entities, and standardizing text to lowercase. Following these initial steps, the processed text undergoes word embedding, utilizing advanced algorithms like Word2Vec and BERT. This study delves into how various text preprocessing and word embedding techniques influence the effectiveness of fake news detection systems. Specifically, it examines the roles that the choice of classification, embedding, and preprocessing techniques play in optimizing key metrics such as accuracy, precision, sensitivity, and specificity in the context of fake news identification. Our findings highlight that the strategic inclusion of stopwords, particularly in conjunction with BERT embeddings, enhances the performance of fake news detection models, alongside the careful selection of threshold criteria for word frequency.
Presenting Author
Jessica Hauschild, United States Air Force Academy
First Author
Jessica Hauschild, United States Air Force Academy
CoAuthor
Kent Eskridge, University of Nebraska, Statistics Department
Target Audience
Mid-Level
Tracks
Knowledge
Women in Statistics and Data Science 2024
You have unsaved changes.