Deciphering Article Popularity in the Digital Era: Comprehending Public Attitude with Supervised Machine Learning Models

Conference: Symposium on Data Science and Statistics (SDSS) 2024
06/06/2024: 1:55 PM - 2:00 PM EDT
Lightning 

Description

The age of the Internet has transformed information retrieval and engagement with the news. 2020 was marked with a series of unprecedented events, such as the pandemic, the murder of George Floyd, and the crucial 2020 U.S. presidential election. In a survey conducted by the Pew Research Center in 2020, a little over half of the respondents (53%) claim that they got their news from social media and digital platforms. Thus, an increasing number of individuals actively participated in discussions on various platforms, including commenting on media platforms or posting on social media like Twitter. As discussions around these events proliferated across social media and news outlets, understanding the factors driving the popularity of articles became paramount. Articles were collected from The New York Times between 01/01/2020 to 12/31/2020 to understand what features of user engagement and the characteristics of the article itself have with popularity. A total of 16,787 articles were included in our analysis, with information on the article's section, headline, abstract, keyword, word count, publication date, number of comments, sentiment, and popularity recorded. Supervised machine learning models, including linear, ridge, lasso, random forest, and gradient boosting regressions, were employed to understand the data. Based on feature selection in using the random forest, factors like publication date (0.324), section (0.305), and word count (0.371) significantly impact article engagement, while sentiment had no influence over the popularity of an article. Using those features, top sections and keywords were identified from popular articles, while exploring temporal trends to gauge discourse intensity during specific periods. Analysis of top sections with the highest comment counts revealed keywords centered around major events like COVID-19, the 2020 election, and the killing of George Floyd, with engagement peaking during the summer of 2020.

Keywords

machine learning

2020

predictive modeling

digital news

public discourse

feature selection 

Presenting Author

Anusha Natarajan

First Author

Anusha Natarajan

Tracks

Practice and Applications
Symposium on Data Science and Statistics (SDSS) 2024