Print Close

028 - GRANDPA: GeneRAtive Networks using Degree and Property Augmentation for the simulation and generation of privacy-preserving healthcare networks

Presented During: Welcome Reception & Poster Session I

Conference: International Conference on Health Policy Statistics 2023

01/09/2023: 5:30 PM - 6:30 PM MST
Posters

Description

Social networks of healthcare utilization and processes are often constructed using confidential Medicare administrative data. This confidentiality hinders researchers' ability to study and distribute data that can aid reproducible research. Network analysis provides a powerful approach to investigating the complex structure of relationships in various contexts, such as healthcare delivery and biology. In conjunction with graph topological structures, patterns of node properties or attributes across the network provide insights into the key (vertex, edge, and sub-network) features underlying the network. For example, in health services research, node properties, such as physician's specialty, may help explain the community structure in the network. In this work, we propose a graph simulation model which simulates networks using degree and property augmentation (GRANDPA) and include a flexible R package for users to generate healthcare networks that emulate the original but can be freely distributed and analyzed.

Our GRANDPA model is built upon the Property Graph Model (PGM) introduced by Sathanur et. al.[1] PGM simulates graphs by computing the probability distribution of all possible combinations of attribute categories, i.e. joint label assignment distribution, for vertices and the joint distribution over pairs of label categories for modeling the edge connectivity, with the option of adding a degree label by dividing the degree distribution into bins. Along with the degree distribution, community structure has previously been shown to be useful in recovering the network structure.[2] GRANDPA extends the PGM framework by allowing the additional augmentation of a community detection label as well as other topologically defined vertex characteristics.

To demonstrate GRANDPA's potential, we conducted two case studies. The first used Zachary's karate club network[3] included in the igraph package in R,[4] and we created two vertex labels associated with network sub-structures. In the second case study, we generated a unipartite patient-sharing network using the 2019 Medicare claims data. We used physician's primary medical specialty as a node attribute label. For both case studies, we augmented the generated graphs with a degree distribution label and a community membership label.[5] In addition, we added an attribute label by computing the linchpin score of each physician to identify their importance as a "one-of-a-kind" physician to their neighbors based on their specialty.[6]

To assess network recovery, we compared the original graph with generated graphs. In these case studies, both degree and community augmentation lead to generated networks that best resembled original networks. In the karate network, the normalized root mean square error (NRMSE) between the complementary cumulative distribution function (CCDF) of the original and the best-returned graph's degree-distribution is 0.0508. Visually, the original graph and the final generated graph have nearly identical topological structures. For the unipartite healthcare network, the NRMSE between the CCDF of the original and the best generate graphs degree-distribution is 0.0514. Likewise, the final graph has the same number of communities and similar internal community structures compared to the original one.

Reference:

[1] A. V. Sathanur, S. Choudhury, C. Joslyn, and S. Purohit, "When Labels Fall Short: Property Graph Simulation via Blending of Network Structure and Vertex Attributes," Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2287–2290, Nov. 2017, doi: 10.1145/3132847.3133065.
[2] B. Karrer and M. E. J. Newman, "Stochastic blockmodels and community structure in networks," Phys. Rev. E, vol. 83, no. 1, p. 016107, Jan. 2011, doi: 10.1103/PhysRevE.83.016107.
[3] W. W. Zachary, "An Information Flow Model for Conflict and Fission in Small Groups," Journal of Anthropological Research, vol. 33, no. 4, pp. 452–473, Dec. 1977, doi: 10.1086/jar.33.4.3629752.
[4] G. Csardi and T. Nepusz, "The igraph software package for complex network research," p. 10.
[5] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi, "Defining and identifying communities in networks," Proc. Natl. Acad. Sci. U.S.A., vol. 101, no. 9, pp. 2658–2663, Mar. 2004, doi: 10.1073/pnas.0400054101.
[6] M. D. Nemesure, T. M. Schwedhelm, S. Sacerdote, A. J. O'Malley, L. R. Rozema, and E. L. Moen, "A measure of local uniqueness to identify linchpins in a social network with node attributes," Appl Netw Sci, vol. 6, no. 1, p. 56, Dec. 2021, doi: 10.1007/s41109-021-00400-8.

Speaker

Yifan Zhao, Program in Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth

CoAuthor(s)

Carly Bobak
James O'Malley, Dartmouth University, Geisel School of Medicine