Gender Differences in the Development of R Packages on GitHub

Conference: Symposium on Data Science and Statistics (SDSS) 2024
06/05/2024: 11:00 AM - 11:25 AM EDT
Refereed 

Description

The analysis of the gender dynamics in scientific research and respective outputs is crucial for ensuring that science policy is inclusive and equitable. Similar to other research outputs such as publications and patents, open source software (OSS) projects are also developed by contributors from universities, government research institutions, and nonprofits, in addition to businesses. Despite its reach and continued rapid growth, reliable and comprehensive survey data on OSS does not exist, limiting insights into contributions by gender and policy-makers' ability to assess trends in gender representation. Like in scientific research, the inclusion of diverse perspectives in software development enhances creativity and problem-solving. Using GitHub data, researchers have found positive correlations between gender diversity of an OSS development team and its productivity (Vasilescu et al., 2015; Ortu et al., 2017). Yet there is evidence of gender bias, with women facing higher standards to have their contributions accepted (Terrell et al., 2017; Imtiaz et al., 2019).

This exploratory study aims to quantify gender differences in development and use (impact) of OSS using publicly available information collected from GitHub. We focus on software packages developed for programming language R, with the majority of contributors from academia. The paper asks (1) what are gender differences in the volume of contributions? (2) has gender representation shifted over time? (3) is there a correlation between the gender of contributors and the impact of a package? Our dataset includes 1,883,977 commits to 7,016 registered R packages from 2008 to mid 2023 and information about 14,311 unique contributors. Through percentage breakdowns we showcased how different gender groups contributed to OSS projects through commits, lines of code, and package ownership.

Keywords

open source software

gender gap

R

dependency networks

science of science 

Presenting Author

Carol Moore

First Author

Carol Moore

CoAuthor(s)

Uyen Nguyen, University of Virginia
Gizem Korkmaz, Westat

Tracks

Practice and Applications
Symposium on Data Science and Statistics (SDSS) 2024