Use NER to Derive Health Policy Insights from Social Media Data

Conference: International Conference on Health Policy Statistics 2023
01/11/2023: 11:30 AM - 11:45 AM MST
Contributed 

Description

Introduction
Having up-to-date information about public opinion and marketing dynamics surrounding tobacco and vaping products is vital to public health policy. Not only do many marketing campaigns target young audiences, but also the spread of misinformation about the health impacts of these products is frequent in certain communities. One central task involved in monitoring marketing and public opinion dynamics around tobacco and vaping products entails identifying specific brands and product flavors. Social media data offer the promise of near real-time monitoring, but, because the data are unstructured and largely text, locating mentions of brands and flavors is a methodologically onerous task. This presentation showcases NORC's efforts using natural language processing (NLP) to address this challenge.
Methods
Specifically, we use named-entity recognition (NER) to locate mentions of brands and flavors in Twitter posts. NER involves identifying specific words or character strings within a larger text that are instances of a type of entity. For example, if brand and flavor are types of entity, 'JUUL' and 'menthol' would be instances of brand and flavor, respectively. Many off-the-shelf tools exist for NER, but they typically do not identify brands and flavors, instead identifying entities like persons, organizations, times, dates, and currencies. Custom NER has proven to be a powerful tool for identifying brand and flavor mentions on other social media platforms, such as Instagram (Chew et al., 2022).
Our team used Azure Cognitive Service for Language to develop a custom NER model that identifies mentions of brands and flavors within Twitter posts about tobacco and vaping products. Doing so enabled us to leverage transfer learning by fine-tuning their pre-trained language model with vape-related Twitter data. That is, Azure has previously trained a language model, and we fine-tuned it on a set of tweets specifically pertaining to tobacco and vaping products. Although Azure Custom NER imposes certain methodological restrictions, this downside is outweighed by the ability to rapidly cycle through the custom NER model development process.
Data
Twitter is one of the most widely used social media platforms and is regularly used to monitor public health, for example to inform public health policy surrounding COVID-19. We collected Twitter data based on tobacco- and vaping-related search terms, then constructed a training sample based on the presence of in-text mentions of popular vape brands and flavors, which were informed by transaction data from 2014 through 2018. In total, the training sample was comprised of 2,311 brand mentions and 2,339 flavor mentions from 2,242 tweets. We used an 80/20 train-test split.
Results
Our model achieved high performance in detecting brand and flavor mentions. For brands, we observed an F1 score of 90.48% (precision of 90.39% and recall of 90.57%), and for flavors, we observed an F1 score of 90.27% (precision of 90.17% and recall of 90.36%). In addition to the notable performance, we realized substantial reductions in coding time, data labeling, model deployment and maintenance compared to the time historically spent on these tasks.
Conclusion
Our efforts illustrate that custom NER can upgrade the pipeline from health data to health policy. In addition to superior performance, cloud-based services that enable rapid model development, iteration, and deployment empower researchers to maintain pace with the rapidly changing market and associated health concerns. Our presentation will conclude with a discussion of on-going efforts to improve model performance and of practical insights into when customer NER and cloud-based computing solutions are effective tools.
References
Chew R, Wenger M, Guillory J, Nonnemaker J, Kim A. Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis. J Med Internet Res. 2022 Jan 18;24(1):e30257.

Keywords

natural language processing

named-entity recognition

social media data

public health

cloud computing

electronic nicotine delivery systems 

Presenting Author

Andrew Norris, NORC at The University of Chicago

First Author

Andrew Norris, NORC at The University of Chicago

CoAuthor(s)

Brandon Sepulvado
Yoonsang Kim, NORC At The University of Chicago
Ganna Kostygina, NORC at the University of Chicago
Sherry Emery