Attributing Credit and Measuring Impact of Open Source Software Using Fractional Counting

Conference: Symposium on Data Science and Statistics (SDSS) 2024
06/06/2024: 1:40 PM - 1:45 PM EDT
Lightning 

Description

Open source software (OSS) has become an essential in knowledge production and innovation in both academic and business sectors around the globe. OSS is developed by a variety of entities and is considered a "unique scholarly activity" due to the complexity of scientific computational tasks and the necessity of cooperation and transparency for research methodology. While the developers of OSS are thought to be very widespread, there remains many questions to be answered about who these contributors are, who are the largest contributors (countries, sectors, organizations), and how they influence each other.

Using data collected on Python and R packages from GitHub, we leverage fractional-counting methods to measure the exact contribution of each developer and use weighted counting based on the lines of code added to accurately sum the contribution of countries to OSS. We find that for both Python and R, developers from a small group of top countries account for a considerable share of code additions. Developers from the top 10 countries, which include the United States, Germany, United Kingdom, France, and China comprise of 76.1% of the total R repositories, and 66.6% of Python repositories.

Next, we use the dependency relationship between packages and study the pairwise connections between countries to measure their respective impact, finding that the packages attributed to United States are most frequently reused by packages from Germany, Spain, Italy, Australia, and United Kingdom based on the total dependency fractions. In parallel, United States mostly uses packages from Germany, France, and Denmark.

Influential contributors to OSS can contribute heavily to the priorities and practices of scientific research when their work is widely used or built upon by other researchers. In this context, studying the global distribution, collaboration, and impact of the contributors is important to understanding the landscape of innovation in scientific research.

Keywords

Open source software

Science of science

Bibliometrics

Fractional counting

GitHub

Dependency networks 

Presenting Author

Nick Askew, Westat

First Author

Nick Askew, Westat

CoAuthor(s)

Gizem Korkmaz, Westat
Clara Boothby, National Center for Science and Engineering Statistics

Tracks

Practice and Applications
Symposium on Data Science and Statistics (SDSS) 2024