06/06/2024: 1:40 PM - 1:45 PM EDT
Lightning
Open source software (OSS) has become an essential in knowledge production and innovation in both academic and business sectors around the globe. OSS is developed by a variety of entities and is considered a "unique scholarly activity" due to the complexity of scientific computational tasks and the necessity of cooperation and transparency for research methodology. While the developers of OSS are thought to be very widespread, there remains many questions to be answered about who these contributors are, who are the largest contributors (countries, sectors, organizations), and how they influence each other.
Using data collected on Python and R packages from GitHub, we leverage fractional-counting methods to measure the exact contribution of each developer and use weighted counting based on the lines of code added to accurately sum the contribution of countries to OSS. We find that for both Python and R, developers from a small group of top countries account for a considerable share of code additions. Developers from the top 10 countries, which include the United States, Germany, United Kingdom, France, and China comprise of 76.1% of the total R repositories, and 66.6% of Python repositories.
Next, we use the dependency relationship between packages and study the pairwise connections between countries to measure their respective impact, finding that the packages attributed to United States are most frequently reused by packages from Germany, Spain, Italy, Australia, and United Kingdom based on the total dependency fractions. In parallel, United States mostly uses packages from Germany, France, and Denmark.
Influential contributors to OSS can contribute heavily to the priorities and practices of scientific research when their work is widely used or built upon by other researchers. In this context, studying the global distribution, collaboration, and impact of the contributors is important to understanding the landscape of innovation in scientific research.
Open source software
Science of science
Bibliometrics
Fractional counting
GitHub
Dependency networks
Presenting Author
Nick Askew, Westat
First Author
Nick Askew, Westat
CoAuthor(s)
Gizem Korkmaz, Westat
Clara Boothby, National Center for Science and Engineering Statistics
Tracks
Practice and Applications
Symposium on Data Science and Statistics (SDSS) 2024