21 lines
2.0 MiB
21 lines
2.0 MiB
{"id": "46ea8cde74ce1fd2b9ec4b693331e85b3fdd02cb", "text": "An empirical study of integration activities in distributions of open source software\n\nBram Adams \u00b7 Ryan Kavanagh \u00b7 Ahmed E. Hassan \u00b7 Daniel M. German\n\nPublished online: 31 March 2015\n\u00a9 Springer Science+Business Media New York 2015\n\nAbstract Reuse of software components, either closed or open source, is considered to be one of the most important best practices in software engineering, since it reduces development cost and improves software quality. However, since reused components are (by definition) generic, they need to be customized and integrated into a specific system before they can be useful. Since this integration is system-specific, the integration effort is non-negligible and increases maintenance costs, especially if more than one component needs to be integrated. This paper performs an empirical study of multi-component integration in the context of three successful open source distributions (Debian, Ubuntu and FreeBSD). Such distributions integrate thousands of open source components with an operating system kernel to deliver a coherent software product to millions of users worldwide. We empirically identified seven major integration activities performed by the maintainers of these distributions, documented how these activities are being performed by the maintainers, then evaluated and refined the identified activities with input from six maintainers of the three studied distributions. The documented activities provide a common vocabulary for component integration in open source distributions and outline a roadmap for future research on software integration.\n\nCommunicated by: Filippo Lanubile\n\nB. Adams (\u2709)\nMCIS, Polytechnique Montr\u00e9al, Montr\u00e9al, Canada\ne-mail: bram.adams@polymtl.ca\n\nR. Kavanagh \u00b7 A. E. Hassan\nSAIL, Queen\u2019s University, Kingston, Canada\nR. Kavanagh\ne-mail: ryan@cs.queensu.ca\nA. E. Hassan\ne-mail: ahmed@cs.queensu.ca\n\nD. M. German\nUniversity of Victoria, Victoria, Canada\ne-mail: dmg@uvic.ca\nKeywords Software integration \u00b7 Software reuse \u00b7 Open source distributions \u00b7 Debian \u00b7 Ubuntu and FreeBSD\n\n1 Introduction\n\nSoftware reuse is \u201cthe use of existing software or software knowledge to construct new software\u201d (Frakes and Kang 2005). Reuse roughly consists of two major steps (Basili et al. 1996): 1. identifying a suitable component to reuse, and 2. integrating it into the target system. For example, vendors of mobile phones typically reuse an \u201cupstream\u201d (i.e., externally developed) operating system component in their device, customized with proprietary device drivers, control panels and utilities (Jaaksi 2007). Reuse is very commonplace, as shown in studies on software projects of different sizes in China, Finland, Germany, Italy and Norway (Chen et al. 2008; Hauge et al. 2008, 2010; Jaaksi 2007; Li et al. 2008, 2009). For example, almost half of the Norwegian software companies reuse \u201cOpen Source\u201d (OSS) in their products (Hauge et al. 2008), while 30 % of the functionality of OSS projects in general reuse existing components (Sojer and Henkel 2010).\n\nAlthough reuse speeds up development, leverages the expertise of the upstream project and, in general, improves the quality and cost of a product (Basili et al. 1996; Gaffney and Durek 1989; Szyperski 1998), it is not entirely risk- and cost-free. In particular, the integration step of reuse consumes a large amount of effort and resources (Boehm and Abts 1999; Brownsword et al. 2000; Di Cosmo et al. 2011; Morisio et al. 2002), for various reasons. \u201cGlue code\u201d (Yakimovich et al. 1999) needs to be developed and maintained to make a component fit into the target system, and developers need to continuously assess the impact on this glue code of new versions of the component (such a new version can bring an unpredictable set of bug fixes and features). Furthermore, the component might depend on other components, whose bugs could propagate to the target system in undocumented ways (Dogguy et al. 2010; McCamant and Ernst 2003; Orsila et al. 2008; Trezentos et al. 2010).\n\nThe ability to make local changes to the source code of a reused component introduces even more challenges, since an integrator typically is not familiar with the reused component\u2019s code base and hence can easily introduce bugs in such local changes (Hauge et al. 2010; Li et al. 2005; Merilinna and Matinlassi 2006; Stol et al. 2011; Tiangco et al. 2005; Ven and Mannaert 2008). Worse, if the local changes are not contributed back to the owner of the reused component, the organization that made the changes will need to maintain them and possibly re-apply them themselves to future versions of the component (Spinellis et al. 2004; Ven and Mannaert 2008).\n\nThus far, most of the empirical studies on integration of components (Brownsword et al. 2000; Hauge et al. 2010; Li et al. 2005; Merilinna and Matinlassi 2006; Morisio et al. 2002; Stol et al. 2011; Ven and Mannaert 2008) concentrated on the base case of integrating one component in a target system. In practice, however, organizations tend to integrate not one, but two or more components, which brings along a set of unique challenges (Morisio et al. 2002; Van Der Linden 2009; Ven and Mannaert 2008), especially given the popularity of open source development: in the timespan of one release, an organization needs to coordinate the integration of updates by multiple vendors, typically with totally independent release dates (Boehm and Abts 1999; Brownsword et al. 2000). For example (Jaaksi 2007), Nokia\u2019s N800 tablet platform reused 428 OSS components, 25 % of which were reused as is (e.g., bzip2 and GNU Chess), 50 % were changed locally (e.g., the graphics subsystem), and 25 % were developed in-house using open source practices (\u201cinner source\u201d, ISS). It is\nunclear for organizations like Nokia how to keep their system stable and secure amidst the integration of so many different components (Hauge et al. 2010). Furthermore, there is a clear need (Boehm and Abts 1999; Crnkovic and Larssom 2002; Merilinna and Matinlassi 2006) for dedicated training and education of developers and organizations on integration, since in a world of open source they now need to collaborate with the providers of 3rd party components and other external contributors to benefit from external contributions and to avoid having to maintain bug fixes and other customizations oneself.\n\nThis paper aims to improve the understanding of multi-component integration by empirically studying and documenting the major integration activities performed by OSS distributions (Gonzalez-Barahona et al. 2009). An OSS distribution basically is a \u201cpackaging organization\u201d (Ruffin and Ebert 2004; Merilinna and Matinlassi 2006), i.e., an organization that integrates upstream components into a common platform (similar to product lines Meyer and Lehnerd (1997) and Pohl et al. (2005)), ironing out bugs and intellectual property issues, and providing extensive documentation and training on the integrated components. Reusing an OSS component through an established distribution provides more confidence in the quality of the component (Tiangco et al. 2005), and hence many companies use OSS distributions as the basis for products like routers, mobile phones or storage devices (Koshy 2013). Examples of established OSS distributions are Eclipse, GNOME and operating system distributions like Debian or Ubuntu.\n\nHere, we focus on operating system distributions (henceforth called \u201cOSS distribution\u201d), which bundle and customize OSS operating system kernels (e.g., Linux or BSD), system utilities (e.g., compilers and file management tools) and end-user software (e.g., text processors, games and browsers) with a dependency-aware package system. There are almost 400 active OSS distributions, and each year 26 new ones are born (Lundqvist 2013). Given the growing competition, distributions need to release new features and versions in an ever shorter time frame (Hertzog 2011; Remnant 2011; Shuttleworth 2008) to millions of desktop users and server installations. To achieve this, they rely on hundreds of volunteers to integrate the latest versions and bug fixes of the tens of thousands of integrated upstream components.\n\nWe empirically studied the major integration activities of three of the most popular and successful OSS distributions, i.e., Debian, Ubuntu and FreeBSD, using qualitative analysis on an accumulated 29 years of historical change and bug data. We document these activities and the steps used to perform them in a structured format, distilling the state-of-the-practice tools and processes followed by the actors involved in the activity, providing concrete examples, and comparing our findings to prior research and integration outside the context of OSS. Six members of the maintenance community of the analyzed distributions discussed and refined the documented activities, and provided feedback on the usefulness and completeness of the activities. Similar to the concept of design patterns (Gamma et al. 1995) or reference architectures (Bowman et al. 1999), the documented activities can be used by (1) organizations as a common terminology for discussing and improving integration activities for components, and (2) researchers to set up a road map for research on integration, since integration remains a largely unexplored research area (Goode 2005; Hauge et al. 2010; Stol et al. 2011).\n\nThe main contributions of this paper are:\n\n\u2013 Identification and documentation of seven major integration activities and the processes that they follow in three major OSS distributions.\n\u2013 Identification of major challenges for tool support and research for integration activities.\n\u2013 Evaluation of and feedback on the identified activities and challenges by six integration maintainers and release managers of the analyzed distributions.\n\nThis paper is structured as follows. First, Section 2 discusses background and related work on software integration and OSS distributions, after which Section 3 presents the design of our qualitative analysis. Section 4 documents the seven integration activities that we identified during our analysis, followed by a discussion of the open challenges that we identified (Section 5) and the evaluation of our findings by six practitioners (Section 6). We conclude with threats to validity (Section 7) and the conclusion (Section 8) of our study.\n\n2 Background and Related Work\n\nThis section discusses background and related work on integration and open source distributions. Table 1 summarizes key technical terms that will be used throughout the paper.\n\n2.1 Software Integration\n\nReuse can be black box or white box (Frakes and Terry 1996). Black box reuse refers to \u201cCommercial Off The Shelf\u201d (COTS) components (Boehm and Abts 1999), for which\n\n| term | meaning |\n|-----------------------|--------------------------------------------------------------------------|\n| reuse | identification and integration of a component (e.g., class or library) into a system |\n| OSS reuse | reuse of Open Source Software |\n| COTS reuse | black box reuse based on Commercial Of The Shelf components |\n| ISS reuse | reuse of Inner Source Software, i.e., OSS developed in-house |\n| integrator | organization that integrates a third party component into its product |\n| maintainer | individual or team doing physical integration on behalf of integrator |\n| downstream project | synonym for \u201cintegrator\u201d |\n| upstream project | organization (open source project or company) whose components are being integrated by another project |\n| upstream component | component developed by upstream project that is being reused |\n| multi-component integration | integration of more than one upstream component |\n| packaging organization| integrator whose business goal is to package upstream components into a coherent platform that is offered for sale or reuse |\n| package | upstream component that has been integrated into an OSS distribution using the distribution\u2019s packaging format (e.g., \u201crpm\u201d) |\n| binary distribution | distribution providing compiled code in its packages |\n| source-based distribution | distribution providing source code in its packages, for compilation on the end-user\u2019s machine |\n| derived distribution | \u201cchild\u201d distribution that customizes packages of an existing \u201cparent\u201d distribution and adds additional packages to it |\nsource code typically is not available. Hence, such components can only be configured and plugged into a target system. White box reuse provides access to the component\u2019s source code to customize it to the needs of the target system, either because the component is OSS (Spinellis et al. 2004) or because it is developed in-house following open source principles (\u201cinner source\u201d, ISS), a practice that is increasingly more common in large companies like Alcatel-Lucent, HP, Nokia, Philips and SAP (Stol et al. 2011). OSS and ISS reuse are also very common in the base platform of software product lines (van der Linden et al. 2007; Pohl et al. 2005; Van Der Linden 2009), since up to 95% of such a platform consists of \u201ccommoditized\u201d features readily available from upstream projects.\n\nIn general, software reuse creates a win-win situation for the reusing organization and the upstream project whose software is reused. The former benefits from the features provided by the component in terms of productivity and product quality (Frakes and Kang 2005; Szyperski 1998), while the upstream project benefits financially (through licensing) and/or qualitatively from the various forms of feedback in the form of defect reports, code contributions and user experiences. However, despite the differences between COTS and OSS/ISS, all forms of reuse introduce a dependency on an upstream project (COTS/OSS) (Di Giacomo 2005; Hauge et al. 2010; Lewis et al. 2000; Mistr\u00edk et al. 2010; Morisio et al. 2002) or another division inside the organization (ISS) (Van Der Linden 2009), which can lead to hidden maintenance costs.\n\nSoftware reuse has been studied extensively from the perspective of how to make a software system reusable (Coplien et al. 1998; DeLine 1999; Frakes and Kang 2005; Mattsson et al. 1999; Parnas 1976; Pohl et al. 2005), how to select components for reuse (Bhuta et al. 2007; Chen et al. 2008; Li et al. 2009), how to resolve legal issues regarding software reuse (German et al. 2010), and what factors can impact collaboration between the component provider and integrators (Brooks 1995; Curtis et al. 1988; Herbsleb and Grinter 1999; Herbsleb et al. 2001; Seaman 1996). In particular, Curtis et al. (1988) found, based on interviews, how the need to communicate outside the team, department or even company boundaries opens a can of worms (e.g., finger-pointing, silos of domain knowledge, limited communication channels, lack of contact persons and misunderstanding due to different context) that can negatively impact the integration process. Herbsleb and Grinter (1999) and Herbsleb et al. (2001) empirically proved that the need to involve more people indeed relates to the time necessary to resolve bugs and integration issues.\n\nIn contrast, the concrete activities involved with the integration of reused components, as well as their costs, have been studied in substantially less detail. Especially for multi-component integration, where not one but a potentially large number of (typically open source) components are being reused by an organization at the same time, empirical evidence is currently lacking (Morisio et al. 2002; Van Der Linden 2009; Ven and Mannaert 2008). Lewis et al. (2000) note that \u201cThe greater the number of components, the greater the number of version releases, each potentially coming out at different times.\u201d Hence, what kind of activities does such integration imply, and how do those activities relate to known activities for single-component integration? Before explaining how this study addresses these questions, we first discuss prior work on COTS, OSS and ISS reuse.\n\n2.1.1 COTS Reuse and Integration\n\nBrownsword et al. (2000) studied over 30 medium-to-large commercial projects to analyze the hidden integration activities of COTS reuse. They found that for an organization it is important to be informed about (new versions of) promising COTS components and to continuously monitor the impact of the components on the organization\u2019s code base. They\nalso point out the maintenance issues of glue code and configuration of a COTS component, and the fact that projects do not control the upstream project. However, their findings are rather high-level, and do not explain how the projects coped with multi-component integration.\n\nLewis et al. (2000) relate on their experience with COTS reuse in 16 government organizations. They especially stress the loss of control as soon as a contract for COTS reuse is signed: any clause or adaptation that was not negotiated will result in additional costs down the line. Changing one\u2019s own system or looking for another COTS component is preferable to requesting (and having to pay) the component vendor to adapt her component. The main question in the studied organizations\u2019 mind was \u201cHow do we upgrade an operational system without a great deal of disruption?\u201d. There was no consensus whether one should always update to the latest version of a reused component, wait until a new major version or incorporate only the most pressing changes (e.g., security fixes). These questions only aggravated for those organizations that were reusing dozens of components, which causes additional coordination issues.\n\nA similar study was performed by Morisio et al. (2002) at NASA. Again, integration was the most costly aspect of COTS reuse, yet the integration activities varied widely across projects. Glue code was the main means of integration, and the authors note that most successful projects had to stay in contact with the COTS component provider throughout the lifecycle of the system to avoid surprises in the next version of the COTS.\n\n2.1.2 OSS Reuse and Integration\n\nMerilinna and Matinlassi (2006) performed a literature survey and structured interviews with nine small-to-medium Finnish companies that reuse OSS components. They found that integration problems are primarily due to the heterogeneous environments that components need to support as well as the lack of documentation, forcing companies to rely primarily on their own experience. Merilinna et al. identified three ways to deal with integration problems: using OSS components as a COTS component (no changes to the code), contributing changes back upstream, or using a packaging organization like an OSS distribution as mediator. Not upgrading to a new version of a reused component can also help. In any case, a thorough analysis of the OSS component to be reused can avoid many problems.\n\nVen and Mannaert (2008) performed interviews with members of a commercial project reusing OSS components, and examined in detail the trade-off between changing the code and contributing the changes back. Even though a project wants to avoid maintaining local changes (since this is costly), the alternative of contributing changes to the upstream project also requires an investment of time and resources, for example to get to know the contribution procedures and to keep track of the future evolution of the upstream project. Even if a patch is accepted by the upstream project, the organization developing the patch might still be required to maintain it since only it has all the insight. Ven et al. recommend to contribute patches if the local changes are sufficiently generic, to maintain patches oneself if they are too specific, or (in the worst case) to fork the upstream project, even though such a fork has only a small chance of success.\n\nWhile Merilinna and Matinlassi (2006) and Ven and Mannaert (2008) identified two integration activities that we also identified in our study (i.e., Upstream Sync and Local Patch), we approached those activities from the perspective of a packaging organization (and multi-component integration) and documented them in a structured way.\n2.1.3 ISS Reuse and Integration\n\nStol et al. (2011) studied the emerging practice of developing and reusing code in-house using open source practices (ISS). ISS is a popular phenomenon in large companies, since it provides the benefits of OSS reuse without giving up control. Some companies only offer their employees the infrastructure for ISS reuse, while others make it part of their development strategy. A systematic literature study and detailed study of ISS inside an organization shows that the most costly ISS issues are due to integration. In addition to the integration issues related to OSS reuse in general, other challenges like backwards compatibility and the peculiar interplay between the ISS team and other teams in a company were identified. For example, the ISS team can send a \u201cdelivery advocate\u201d to other teams to help them integrate the ISS components. However, various activities are company- and ISS reuse-specific. For example, the ISS team receives components initially from a specific team in the organization, but after integration becomes responsible for it itself and starts acting as upstream for the other teams in the organization (even though the original developers still collaborate on the development of the component). In this paper, OSS distributions and upstream projects are separate, independent entities.\n\nFinally, Van Der Linden (2009) reports on adoption of OSS and ISS reuse in software product lines (Meyer and Lehnerd 1997; Pohl et al. 2005). The platform on which such product lines are built largely consists of common functionality for which many components are available. Reuse of OSS and ISS components for such functionality improves the quality and speed of development, however it also introduces a dependency on the upstream projects, not only from the platform, but from all products based on the platform. In addition to the best practices mentioned before, close collaboration with the upstream projects in a symbiotic fashion is key to keeping track of new features and changes, and can be established by reporting or fixing bugs. Although OSS distributions can be seen as a product line, our study focuses especially on the identification and structured documentation of major integration activities in the context of multi-component integration.\n\n2.2 Open Source Distributions\n\nThis paper focuses on the maintenance activities involved in software integration in the context of OSS distributions, since this context enables us to study integration in a multi-component, open source setting. OSS distributions are one of the most well-known open source packaging organizations (Gonzalez-Barahona et al. 2009; Ruffin and Ebert 2004). Such distributions integrate a collection of upstream software components consisting of an operating system kernel (e.g., Linux or BSD), core libraries, compilation tools and software for users like desktop applications and web browsers. Thanks to their inclusion in an OSS distribution, the integrated upstream projects can reach millions of users without having to market themselves. Although distributions are especially known in the Linux and BSD world, even commercial products like Microsoft Windows and Mac OS X can be considered as distributions (they just ship with more ISS projects than OSS).\n\nThere are hundreds of OSS distributions, most of which integrate thousands of upstream components. Figure 1 shows that the total number of currently active Linux distributions has grown to 380 (in addition to 135 discontinued distributions, which are not shown), increasing more or less by 26 distributions each year (Lundqvist 2013). For the BSD family of open source kernels, there are twelve currently active distributions (Comparison of BSD operating systems 2011), in addition to 22 distributions that are either discontinued or\nhave an unclear status. The most popular Linux distributions like Debian and Ubuntu both integrate more than 24,000 OSS components, whereas FreeBSD (most popular BSD distribution) integrates almost 23,000 components. The Debian distribution doubles in size every 2 years, having passed the mark of 300 MLOC in 2007 (Gonzalez-Barahona et al. 2009).\n\nDespite this large scale, integrating an OSS project\u2019s components into a distribution goes far beyond black-box reuse. First, the upstream components need to be turned into a distributable \u201cpackage\u201d. Distributions such as Debian, Ubuntu and Fedora, compile the components for a particular architecture, then split up the compiled libraries and executables across one or more \u201cbinary\u201d packages. Such packages (together with the packages they depend on) can be automatically installed using a distribution-specific package management system, such as \u201capt\u201d, \u201cdpkg\u201d or \u201cyum\u201d. Source-based distributions, like FreeBSD, distribute the (possibly customized) source code of an upstream component to the end-user as a so-called \u201csource\u201d package (FreeBSD uses the term \u201cport\u201d for this), for compilation on the user\u2019s machine. Unless otherwise specified, the term \u201cpackage\u201d in this paper will refer to both \u201cbinary\u201d and \u201csource\u201d (port) packages.\n\nAfter building and packaging the upstream component, the new package needs to be tested and delivered to the end-user. Once a package becomes available to end-users (including the integrators), the real integration maintenance work starts, since packages (and their dependent packages) need to be continuously updated to new versions of the packaged component. Similarly, bugs in the package should be detected and fixed promptly, and (if appropriate) patches should be sent back to the upstream project that developed the packaged component. Local changes to the package that have not been sent back, however, need to be maintained and kept up-to-date by the distribution. User complaints should be triaged and processed by the distribution as well, before escalating them to upstream, if appropriate.\n\nOrganizations that reuse a component typically (Koshy 2013; Merilinna and Matinlassi 2006) appoint a person or group of people, i.e., the \u201cmaintainer(s)\u201d, to perform and co-ordinate integration activities on the organization\u2019s behalf. Organizations like OSS distributions dealing with multiple upstream projects and components typically have multiple maintainers, each one responsible for a group of related upstream components. Figure 2 shows the interactions of a distribution\u2019s maintainer (in bold) with the other major actors of the distribution. The maintainer packages and customizes the upstream software component by herself, interacting with the upstream project whenever necessary, for example to understand changes in a new release or to communicate reported bugs. Customizations result in local patches applied to the vanilla upstream component, after which the patched component is packaged using the distribution\u2019s package management tool. The package is being tested by the project\u2019s package community, which consists of volunteering contributors and\ntesters. Once stabilized, packages can also be used by end-users, who can contribute bug reports or suggestions by contacting the maintainer. The maintainer\u2019s work ultimately ends up in an official release of the distribution, hence all maintainers are being co-ordinated by the release manager in charge. Some of the common activities of the release manager are discussing release-critical bugs or project-wide packaging policies with the maintainer, and enforcing deadlines.\n\nGiven the size of a distribution, most of the maintainers are responsible for multiple components (each of which is packaged into one or more packages). Debian has around 2,400 (Project participants 2013) maintainers for 24,000 integrated components (a ratio of 10 components per maintainer), while FreeBSD has around 400 (The freeBSD developers 2013) maintainers for 23,000 components (ratio of 57.5). Ubuntu only has around 150 (Ubuntu universe contributors team 2013; MOTU team 2013; Ubuntu core development team 2013) maintainers for 24,000 components (ratio of 160), since most of its packages are inherited as-is from Debian, thus requiring less work. Given the high maintainer-to-component ratios, maintainers often team up to share package responsibilities, but even then, they still need to divide their attention and limited time across many components. In addition, the maintainers are not the developers of the packages that they are maintaining, which means that even more time is spent to fully understand changes or to contact the upstream developers about a change (Brownsword et al. 2000; Stol et al. 2011). Finally, various proposals have been launched to shorten the time frame in between releases of distributions (Hertzog 2011; Remnant 2011) or even to synchronize releases with those of other distributions (Shuttleworth 2008). This further complicates the task of the package maintainers.\n\nThis paper identifies and documents the integration activities that must be done on a daily basis by the maintainers of three of the most successful OSS distributions. Previous research has focused exclusively on the other stakeholders in Fig. 2: the governance processes of distributions (Sadowski et al. 2008), release management (Michlmayr et al. 2007; van der Hoek and Wolf 2003), the package/developer community (Scacchi et al. 2006), the (evolution of the) size and complexity of packages (Gonzalez-Barahona et al. 2009), and the dependencies of packages (German et al. 2007). Given the central role of package maintainers in the success of a distribution, their responsibilities and challenges need to be understood in order to streamline the interaction between the OSS distribution and the upstream project, and to bring new maintainers quickly up-to-speed. Furthermore,\nprevious work focused especially on integration of individual components, while packaging organizations like OSS distributions need to deal with the integration of thousands of components at the same time, with their users expecting the latest versions of each component to be integrated. Finally, open source development forces organizations to collaborate with external parties to reap the full benefits of quality and innovation that can be achieved with open source components. If not, organizations waste substantial effort, for example to maintain their own local patches. Hence, studying the integration activities of distributions will help us understand integration in a multi-component, open source context.\n\nThe following section presents the approach that we followed to identify and analyze the major integration activities in three large OSS distributions.\n\n3 Case Study Setup\n\nThe goal of this paper is to empirically identify and document the major integration activities in use by packaging organizations for multi-component OSS integration, as existing empirical work focused exclusively on single-component integration. Since a wide range of packaging organizations exists, as a first step we focus on some of the most experienced integration experts in the area of OSS reuse, i.e., OSS distributions. In particular, we perform qualitative analysis on three of the largest and most successful OSS operating system distributions, i.e., Debian, Ubuntu and FreeBSD.\n\nAlthough our results consist of integration activities performed in OSS distributions, these activities are not unique to OSS integration, nor are they just a subset of the integration activities performed by commercial organizations. Whereas in a commercial setting organizations used to buy or develop all dependencies themselves, an OSS setting requires one to collaborate with a variety of external stakeholders to avoid being stuck with one\u2019s own patches and customizations. Avoiding this requires a different set of integration activities than before. In fact, those activities now need to trickle back into the commercial organizations that started to adopt OSS practices internally (ISS reuse).\n\nTo help such organizations, as well as open source projects, this paper addresses the following question: What is the core set of activities in OSS for dealing with integration of multiple 3rd party components? This question allows us to empirically study what is being done in OSS integration, how it is being done and what challenges expert integrators still face. In particular, it also helps us understand what are the state-of-the-art techniques in use by OSS projects to facilitate their integration activities.\n\nThis section discusses the methodology for our study, which is also illustrated in Fig. 3. We first performed a qualitative analysis to identify and document major integration activities, then evaluated these findings with stakeholders from the three distributions.\n\nFig. 3 Overview of our case study methodology\n3.1 Subject Selection\n\nTo obtain a representative sample, we selected a mixture of binary and source-based, and derived and independent OSS distributions. A derived (or \u201cchild\u201d) distribution automatically inherits the packages of its \u201cparent\u201d distribution. It then customizes some of those packages, and also adds its own packages, in order to enforce a uniform look-and-feel, focus on specific types of packages or to specialize to a certain set of users (e.g., office workers vs. music producers). Although a derived distribution saves substantial integration time, it also leads to a unique set of integration activities, since each level of derivation adds an additional layer in the integration process.\n\nWhen looking at the history of open source distributions (Lundqvist 2013), Debian and Ubuntu clearly stand out as two of the most influential distributions, with 41.0% of all distributions deriving from Debian (211 out of 380 active and 135 discontinued distributions), 90 from Ubuntu and 17 from FreeBSD. In particular, the Debian distribution has 81 child distributions, 105 distributions deriving from those child distributions (\u201cgrand-children\u201d), 24 great-grand-children and 1 great-great-grand-child (Lundqvist 2013). The latter potentially needs to integrate packages from its four ancestors as well as from some upstream OSS projects directly. Ubuntu itself has 79 children and 11 grand-children (Lundqvist 2013), while FreeBSD has 15 children, 1 grand-child and 1 great-grand-child (Comparison of BSD operating systems 2011).\n\nWe found that the impact of the above distributions on other distributions also translated well to their popularity in terms of number of users. In contrast to mobile app stores, there is no official popularity poll or ranking of OSS distributions. However, since May 2001 one of the leading sources on OSS distributions is the distrowatch.com web site, which contains announcements of new versions of distributions as well as detailed historical overviews of each distribution (either Linux- or BSD-based). One of its major features is that, on a weekly basis, the site keeps track of how many people search or click for each distribution. Although this ranking does not map 1-to-1 to the number of downloads, it does give an important indication about the popularity of OSS distributions.\n\nDespite its age (the first Debian release was made on the 16th of August, 1993), Debian was still the fourth most popular binary distribution at the time of our case study, while Ubuntu was the second most popular binary and derived distribution. We decided not to study the top binary distribution at the time of our case study (i.e., Linux Mint), since it was a rather recent distribution derived from Ubuntu, without sufficient historical data available. The third most popular distribution was Fedora, but since this is independent of the Debian/Ubuntu ecosystem, we also did not study this distribution. As source code-based distribution, we picked the most popular source code-based BSD distribution, i.e., the FreeBSD distribution. Note that FreeBSD is also the most popular BSD distribution in general according to the 2005 BSD Usage Survey (The BSD Certification Group 2005).\n\n3.2 Data Sampling\n\nWe study integration activities by systematically analyzing, categorizing and revising historical package data for Debian, Ubuntu and FreeBSD to create a classification of integration activities. Given the large number of packages and package-versions in the three distributions (Table 2), we could not examine all of them manually. Instead, for each distribution we sampled enough package-versions to obtain a confidence interval of length 5% within\nTable 2 Characteristics of the data for the three subject distributions\n\n| | Debian | Ubuntu | FreeBSD |\n|----------------|--------------|--------------|--------------|\n| start of project | 16/08/1993 | 20/10/2004 | 11/1993 |\n| start of data | 12/03/2005 | 20/12/2005 | 21/08/1994 |\n| end of data | 16/08/2011 | 14/09/2011 | 01/09/2011 |\n| #components | 24,263 | 25,345 | 22,733 |\n| #packages | 92,277 | 66,595 | 22,733 |\n| #pkg. versions | 896,757 | 446,324 | 162,135 |\n| #releases | 4 | 14 | 8 major/55 minor |\n| #maintainers | 2,400 | 150 | 400 |\n\na 95 % confidence level, taking into account the large population size (Cochran 1963):\n\n\\[\n\\text{sample size} = \\frac{ss}{1 + \\frac{ss}{\\#\\text{pkg. versions}}} \\\\\n\\]\n\nwith\n\n\\[\nss = \\frac{Z^2 \\cdot p \\cdot (1 - p)}{0.05^2} \\\\\nZ = 1.96 \\text{ for 95\\% conf. level} \\\\\np = 0.5 \\text{ for pop. with unknown variability}\n\\]\n\nThis means that if we find an integration activity to hold for \\( n \\) % of the sampled package-versions, we can say with a 95 % certainty that \\( n \\pm 5 \\% \\) of all package-versions exhibit that activity. For example, \\( 7 \\pm 5 \\% \\) would mean that the activity would hold with a 95 % certainty for 2 % to 12 % of the package-versions. Although the three distributions have a different number of package-versions, the asymptotic nature of the sample size formula obtained the same number of package-versions (384) for each distribution.\n\n3.3 Data Extraction\n\nWe randomly sampled 384 package-versions from each distribution, then automatically extracted for each selected package-version the corresponding change log message. Such a change log basically consists of a detailed (Koshy 2013) bullet list containing a high-level, textual summary of all major changes in a particular package-version, as well as the explicit IDs of all fixed bugs. Figure 4 shows an example change log message of a Debian package-version (Ubuntu and FreeBSD use a similar format). Except for two changes, all changes in Fig. 4 fix open bug reports, with the reports\u2019 identifier pasted inside the change log. As distributions stipulate that each new package-version has to be documented in a change log (Debian project 2011), we used change log data as starting point for the analysis of each package-version.\n\nTo interpret the change log\u2019s reported changes, we then manually analyzed the referenced bug reports via the distributions\u2019 bug repository. As explained below, each distribution uses a different technology for its change logs and bug repository, but we were able to write scripts to automate the fetching of both the logs and reports. The bug reports often contained references to emails on a distribution\u2019s mailing lists, and sometimes contained\npatches that had been proposed as a possible bug fix. If present, we also studied these messages and patches. Finally, to clarify technical terms or understand particularly unclear bugs or changes, we used the distribution\u2019s developer documentation (accessible from a distribution\u2019s web site) and, in the worst case, any relevant web search, especially for finding relevant communication on online fora. This was only necessary in a small number of cases.\n\nWe now discuss how we obtained the above data for each of the three distributions. This data can be found online in the paper\u2019s replication package (Adams et al. 2015). For Debian, we obtained the names of all integrated components across Debian\u2019s entire history from the so-called snapshot archive. This is a server containing all versions of all packages over time, and allowing scriptable access via a public JSON-based API. Then, for every integrated component, we retrieved all version numbers, their timestamps and the list of binary package names associated with the component (since a component can be split across multiple packages). After sampling 384 package-versions, we downloaded the corresponding change log using a simple script from Debian\u2019s change log repository. Bug reports mentioned in the change logs can be found in the bug repository using the bug identifier. Related email messages and other data mentioned in the bug reports was found by using a web search.\n\nFor Ubuntu, we used the Python API of the Launchpad collaboration platform to retrieve the names and version numbers of all Ubuntu packages that have ever existed. Because Ubuntu is derived from Debian, we filtered the Ubuntu packages to include only the ones customized by Ubuntu, since the other packages are identical to Debian packages. Ubuntu-customized packages have a version number ending in \u201c-MubuntuN\u201d, where \u201cM\u201d and \u201cN\u201d are numbers following a special convention. We found 133,311 of such package versions, belonging to 26,858 packages. Except for a different location of the change logs and bug reports, we used the same approach for data extraction as for Debian.\n\n---\n\n1http://snapshot.debian.org/\n2http://packages.debian.org/changelogs/pool/main\n3http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=XYZ with XYZ the bug identifier\n4http://api.launchpad.net/1.0/\n5http://changelogs.ubuntu.com/changelogs/pool/main\n6Manual search using the bug identifier on https://bugs.launchpad.net/ubuntu\nFor FreeBSD, data extraction was a bit more involved, since it is a source-based repository. For this reason, we retrieved a copy of the FreeBSD version control system (CVS), which contains all local file changes ever made to all reused components. Since such CVS changes are too fine-grained to be considered a \u201cversion\u201d, but releases are too coarse-grained (multiple port versions can exist in between two official releases), we had to reconstruct the port versions by grouping related CVS changes together. For this, we used the FreeBSD convention that each port\u2019s Makefile is expected to have a PORTREVISION variable that is changed \u201ceach time a change is made to the port which significantly affects the content or structure of the derived package\u201d (FreeBSD porter\u2019s handbook 2011). If a maintainer does not change the PORTREVISION (nor the related PORTVERSION variable), the corresponding changes are not deemed important enough to be automatically picked up by users during an update of their installation. We interpret this as \u201cchanges that do not change the PORTREVISION variable do not define a new port version\u201d, similar to the definition of \u201cversion\u201d of binary packages.\n\nIn practice, we determined for each port the timestamps of all changes that change PORTREVISION and/or PORTVERSION, then grouped all changes to a port\u2019s files between two consecutive PORTREVISION changes (excluding the first PORTREVISION change) into one port version. We treated all changes up to and including the first Makefile revision as the first PORTREVISION, to account for the initial import of a port. We wrote scripts that queried the CVS repository for all commit log messages between the start and end date of a port version. The change logs of the resulting port versions then correspond to the concatenation of these commit log messages. Finally, bug reports were obtained from FreeBSD\u2019s bug repository based on the bug identifiers mentioned in the change logs.\n\n3.4 Data Analysis\n\nSince we did not have any classification of integration activities to start from, initially the first author studied the Debian distribution as a pilot project. He manually interpreted the changes documented in the change log of each sampled package-version, then looked up the bug reports referenced by the change log in order to understand which bugs had been resolved or which features had been added, and how this was done. For the latter, the bug reports\u2019 comments were an important source of information. To fully understand the scope and context of more complex changes, he sometimes had to consult email messages referenced by the bug reports and patches attached to them. In case of doubt or usage of unfamiliar technical terms or inside stories, the distribution\u2019s developer documentation was considered or, in the worst case, a web search was performed.\n\nOnce it was clear what exactly the integrators had done to produce the analyzed package-version, the package-version was tagged with any observed activity to summarize the rationale behind the version. Two examples of activities could be \u201cnew release\u201d or \u201cpackage dependency change\u201d. More than one tag could be assigned to a version, since a new version of a package typically consists of multiple changes (as seen earlier in Fig 4). By repeating this procedure for all sampled Debian versions, and constantly revising already analyzed versions when new tags were found, an initial tagging schema was built up, representing different activities that go into a package-version.\n\n---\n\n7 ftp3.ie.FreeBSD.org::FreeBSD/development/FreeBSD-CVS/ports/\n8 pserver:anoncvs@anoncvs.tw.FreeBSD.org:/home/ncvs\n9 http://www.FreeBSD.org/cgi/query-pr.cgi?pr=XYZ with XYZ the bug identifier\nAfter finishing the pilot project on Debian, the first two authors revised the obtained tagging schema, leveraging the second author\u2019s experience as a Debian/Kubuntu maintainer and developer. Some tags were merged, others were renamed, and with the resulting tagging schema in hand, we revised the Debian analysis to standardize the tags used. Afterwards, both authors analyzed the Ubuntu and FreeBSD data using the same tagging schema as a starting point (and using the same approach as for Debian). Conflicts in tagging between both authors were manually resolved through discussion. We did not find additional tags for Ubuntu and FreeBSD, giving us confidence about the completeness of our initial tagging schema. Eventually, we obtained seven very popular tags, two less popular ones and a catch-all tag for multiple unique or less frequent activities unrelated to any of the other tags. We excluded the latter three tags from our analysis, but we come back to them in Section 6. The replication package (Adams et al. 2015) contains the tags and noteworthy observations of the sampled package versions.\n\n3.5 Identification and Documentation of Activities\n\nThe seven most popular tags obtained after the manual analysis all correspond to unique integration activities, however each distribution could have its own terminology and workflow for such an activity. Hence, in order to abstract up the commonalities and variabilities across distributions for a particular activity (tag), all authors together distilled the intent, motivation, common tasks and current practices across the distributions based on (1) the information that we encountered in the change logs, bug reports and mailing lists for the sampled package-versions, as well as (2) the second author\u2019s experience as a Debian/Kubuntu developer. This was an iterative process, trying to separate the essential steps used during an integration activity from implementation details or exceptions in a particular distribution. Typically, each author would refine one or two patterns, then send to the next author for further refinement until no more changes were made to an activity.\n\nSimilar to design patterns (Gamma et al. 1995), we then \u201ccaptured [the activities] in a form that people can use effectively\u201d. For each integration activity, we documented in a rigid format its intent, motivation, the major tasks involved in the activity, its participants, possible interactions with other activities and notable instances of the activity in the three studied distributions (Debian, Ubuntu and FreeBSD). Interactions are based on co-occurrence of activities in our data. We also tried to compare each activity to prior work in the integration literature, to put each activity in context.\n\nDuring the tagging of integration activities, and the abstraction into pattern form, the authors encountered recurring issues and problems of the package maintainers. Such issues and problems were noted down by each author individually, then compared and clustered to obtain a set of challenges, across 4 research areas. After filtering out challenges that were already addressed by related work, we obtained 13 concrete challenges or limitations that, based on our data, seemed to hold back maintainers in their activities. To cross-check those challenges, together with the activities that we documented, we performed a validation with practitioners in the next step.\n\n3.6 Validation of the Activities by Practitioners\n\nIn order to get feedback on the correctness and usefulness of the documented integration activities and challenges, we contacted members of the package maintenance community of Debian, Ubuntu and FreeBSD. We asked them to (1) verify the correctness of the activities that we derived and abstracted from the change log, bug report and other historical data, as\nwell as of the challenges that we uncovered, and to (2) provide feedback on the usefulness of the activities as well as the activities and challenges that we might have missed while analyzing the sampled package-versions.\n\nBased on their extensive experience with the 3 distribution communities, the second and fourth author first compiled a short-list of package maintainers and release engineers experienced with maintaining large packages. We then contacted the people on the short-list by email, since email is the preferred channel of communication for maintainers (and maintainers are volunteers spread across the world, without a fixed office). We played with the idea of creating a bug report for our study, since maintainers track the bug repository of their package from close-by, however since bug reports are a public broadcast medium, and people would have been able to chime in and perhaps influence the maintainer, we discarded the bug repository for our purposes.\n\nWe eventually received feedback from 3 maintainers (M1, M2 and M3) active in both Debian and Ubuntu, one (M6) in Debian, 1 (M5) in Ubuntu, and 1 (M4) in FreeBSD. All of them have at least five to ten years of experience, since the role of package maintainer or release engineer can only be deserved through years of active involvement in a distribution. Note that to respect their anonymity we will refer to all of them as \u201cmaintainers\u201d and use a symbolic name.\n\nWhen contacting the maintainers, we provided them a draft of this paper, then asked them for feedback about the documented activities and challenges. In particular, we asked the following questions to evaluate the usefulness and completeness of the activities and challenges:\n\nQ1 What activities did we miss?\nQ2 What can the documented activities be used for?\nQ3 Which existing tools and techniques for these activities did we miss?\nQ4 What challenges did we miss?\nQ5 What promising tools/techniques do you see coming up to address some of the challenges?\n\nThe maintainers replied to the five questions by email. All six also provided higher-level comments about the paper, with one maintainer providing an annotated pdf with more detailed comments. Despite their busy schedule and the asynchronous nature of email communication (one cannot force someone to reply), only two maintainers left two or more questions blank. We come back to this in Section 6. The email replies were then analyzed by two of the authors and summarized into a table (Table 5) in order to compare the findings across all 6 maintainers.\n\nAt a high level, the obtained feedback showed us whether the activities as a whole made sense, whereas at a lower level it exposed inaccuracies, missed workarounds and any factual errors. We then used this feedback to flesh out the description of the seven documented activities and the 13 challenges, to obtain the final version of the activities documented in the present paper. The contacted members suggested five additional activities, however since we did not have sufficient empirical support for these activities in our data sample, we did not add them to the documented activities. Instead, we discuss those additional activities in Section 6.\nTable 3 Overview of integration activities and their prevalence in the three distributions. Activities below the horizontal line were not common enough to be documented\n\n| Activity | Explanation | % Deb. | % Ub. | % Fre. |\n|------------------------|--------------------------------------------------|--------|-------|--------|\n| A. New Package | Integrating a new software project. | 1.04 | 0.78 | 13.54 |\n| B. Upstream Sync | Updating to a new upstream version. | 40.89 | 43.75 | 57.81 |\n| C. Dependency Management| Managing changes to dependencies. | 38.80 | 30.73 | 28.39 |\n| D. Packaging Change | Changing a package\u2019s packaging logic. | 43.49 | 44.01 | 38.80 |\n| E. Product-wide Concern| Enforcing policies across all packages. | 4.95 | 3.13 | 25.00 |\n| F. Local Patch | Patching upstream source code locally. | 22.40 | 28.39 | 12.24 |\n| G. Maintainer Transfer | Managing unresponsive maintainers. | 5.73 | 0.00 | 2.86 |\n| H. Security | Patching a security vulnerability. | 4.43 | 1.30 | 0.78 |\n| I. Internationalization| Internationalization of packages. | 4.17 | 1.56 | 0.26 |\n| J. Other | Catch-all for rare activities. | 2.34 | 4.95 | 1.04 |\n\n4 Integration Activities in Distributions\n\nTable 3 gives an overview and short explanation of the seven major integration activities that we documented, as well as three less common ones. The table also provides the percentage of sampled Debian, Ubuntu and FreeBSD package-versions that involve each of the activities (within a confidence interval of 5 %). Those numbers are also plotted on Fig. 5. Since a new version of a component can involve multiple integration activities, the percentages in the plots add up to more than 100 %. Upstream Sync, Dependency Management and Packaging Change are the most frequently occurring activities in Debian and FreeBSD. Local Patch is also common in all three projects, whereas New Package and Product-wide Concern are common for FreeBSD.\n\nThe next subsections discuss each of the seven major integration activities in detail. For each activity, we provide:\n\n- **Intent**: Short outline of the goal of the activity.\n- **Motivation**: Short description of the role and rationale of an activity.\n- **Major tasks**: The major steps involved with the activity.\n- **Participants**: A list of stakeholders from Fig. 2 involved with the major tasks of the activity.\n\n Popularity of the integration activities of Table 3 in the 384 sampled (a) Debian, (b) Ubuntu and (c) FreeBSD package-versions (confidence interval with length 5 % for a 95 % confidence level)\nInteractions Activities that co-occurred substantially with a given activity in package-versions, and hence are related.\n\nLiterature Discussion of prior work and approaches for the activity, as well as prevalence of the activity outside the context of OSS distributions.\n\nNotable instances Concrete examples of the activity from the sampled Debian, Ubuntu and FreeBSD package-versions.\n\nA. New Package\n\nIntent: Integrating a previously unpackaged upstream component into a distribution.\n\nMotivation: The users of the distribution or the maintainer of a package require new functionality provided by a component that has been identified but is not yet part of the distribution.\n\nMajor Tasks:\n\n1. Recruiting a Maintainer responsible for integrating the new component and for liaising with the upstream project is one of the most important decisions to take (Koshy 2013; Merilinna and Matinlassi 2006). Most commonly, an upstream developer or motivated end-user requests an upstream component to be integrated in the distribution. One of the distribution\u2019s maintainers might pick up this request and become the maintainer. Alternatively, the upstream developer can package the component herself and ask a distribution maintainer to \u201csponsor\u201d this package, i.e., to review and to upload it to the distribution\u2019s package repository. In that case, although the majority of the integration is done upstream, the maintainer still has the end responsibility. Another possibility is that the distribution appoints a maintainer to the integration of a new component because of a clear need in the distribution.\n\n2. Packaging an Upstream Project requires access to the project\u2019s source code (except for binary-only packages like Adobe Flash) and verification of its license. The maintainer then proceeds to determine the build-time and run-time dependencies of the package. If a dependent component is not yet in the distribution, it has to be packaged first. This is a process of trial-and-error, trying to build the package and fixing any dependency problems. The maintainer might have to customize the software or its makefiles so it would build correctly in the environment of the distribution. When porting the package to other platforms than Linux- or GNU-based ones, it is often needed to remove dependencies on Linux- or GNU-specific libraries or functionality. This can take significant effort. Finally, the maintainer needs to make sure that the package follows the distribution\u2019s policies, such as specific locations for configuration files and manual pages.\n\n3. Creating the Package\u2019s Metadata. The maintainer is responsible for creating the package metadata like the package name, version number and the list of dependent packages. Such metadata is necessary to add the package to the distribution\u2019s package management system (\u201capt\u201d in Debian/Ubuntu, or the port system in FreeBSD) to enable the automatic and systematic building, packaging, and deployment of the software project.\n\n4. Integration Testing. The package must build and run consistently on all supported architectures. Typically, two rounds of tests are used to verify a package. The first round involves only maintainers ironing out any obvious functionality or platform issues. The second round involves uploading the package to a staging area (e.g., \u201cunstable\u201d in Debian), from where expert end-users can install it for use in their daily work. Bugs identified by these users are reported (together with possible patches) to the maintainer, who incorporates this feedback in a new version of the package that is re-uploaded. Some distributions, like Ubuntu, have tools to automatically run integration testing and identify integration issues.\n5. Publishing the Package. If a staged package contains severe bugs, it might be (temporarily) removed from the staging archive until the bugs are resolved. If the package has been stable for a certain period of time, it becomes eligible for inclusion in an upcoming release. The package is either moved to that release\u2019s archive (Debian/Ubuntu), or to the source code repository (FreeBSD).\n\n**Participants:** maintainer, upstream developer, package community, expert end-user.\n\n**Interactions:** New Package is a prerequisite of the other six activities, and usually occurs by itself (i.e., a package-version only involves New Package, and no other activity). In 2.3 \u00b1 5% of the FreeBSD package-versions, it also involves Local Patch to fix a bug or to make the package compile.\n\n**Literature:**\nIn the context of COTS reuse, additional tasks are involved, especially contract negotiations (Information Technology Resources Board 1999; Navarrete et al. 2005). Lewis et al. (2000) note that \u201cVendors are driven by profits [...] They can be cooperative and responsive when it is in their perceived interest to be so.\u201d Various guidelines and risk assessment tools exist to help companies or federal departments select the right COTS components (Information Technology Resources Board 1999; Lewis et al. 2000). They, for example, recommend to find COTS components that fit with the existing architecture, or possibly adjust the architecture first, rather than requiring the COTS vendor to customize their component to the system at hand (since that could be very costly). This is different from OSS distributions, where monetary incentives typically do not exist and OSS distributions sometimes carry enough weight to convince upstream components to adapt to them rather than the other way around.\n\nAlthough not applicable in the case of packaging organizations like OSS distributions, the identification of COTS/OSS components for reuse is a known challenge as well (Morisio et al. 2002; Stol et al. 2011), typically requiring extensive web or literature research, or insightful recommendations by experts. While maintainer recruitment and integration testing are known research problems, the other tasks are less known in research.\n\n**Notable Instance:**\nA New Package with customization: irssi-plugin-otr (Ubuntu) is an IRC client plugin integrated in July 2008. A first customization changed the location for documentation to the Ubuntu default location. The second customization fixed the package\u2019s build process to not download required header files during the build, since the Ubuntu build servers do not have network access.\n\nB. Upstream Sync\n\n**Intent:** Bringing a package up-to-date with a newer version of the upstream component.\n\n**Motivation:** As shown in Fig. 5, synchronizing the existing packages of a distribution with a newer upstream version forms the core activity of integration. End-users expect package maintainers to update their packages to the latest features and bug fixes as soon as possible, while maintainers are more concerned about the long-term stability of a package.\n\n**Major Tasks:**\n1. Becoming Aware of a New Upstream Release largely depends on distribution-specific dashboards that automatically track the development progress of upstream projects. For example, Debian\u2019s watch file mechanism specifies (1) the URL of the upstream project\u2019s download page with all releases of a component, as well as (2) a regular expression to\nidentify the source code and a version number for each release. If the highest version number surpasses the current version, this means that a new release is available.\n\nDerived distributions (e.g., Ubuntu) not only need to synchronize with the upstream projects, but also with their own parent distribution, typically at the start of a new release cycle. For example, out of 167 analyzed Ubuntu package-versions involving Upstream Sync, 99 versions were synchronized with the upstream project, 65 were synchronized with the parent distribution (Debian) and 3 were synchronized with both. Since the derived distribution can leverage the Upstream Sync and other activities performed by the maintainers of the parent distribution, risk assessment (task 2) becomes slightly easier. However, keeping track of which patch was synchronized from which upstream project requires rigorous book-keeping. Projects use custom dashboards for this, sometimes interfacing with the bug reporting infrastructure.\n\n2. Assessing the Risk of an Upstream Release requires the maintainer to review the changes to the previous upstream version (Rodin and Aoki 2011) in order to estimate whether the new version is production-ready. These changes run the risk of breaking important functionality, while end-users do not always need the new features and bug fixes. Despite the importance of this analysis, in practice it currently is a largely manual task supported by basic tools like \u201cdiff\u201d (Rodin and Aoki 2011), change and commit log messages, email communication with upstream developers, and experience.\n\nThe outcome of risk assessment is often to not update to a full new release, but to \u201ccherry-pick\u201d a select number of acceptable changes out of all changes made upstream or by another distribution, then merge those changes into the current package-version (discarding the other changes). For example, an upcoming release of a distribution might be too nearby, making the full import of a new version of a component too risky. Instead, maintainers would cherry-pick the show-stopper bug fixes that they are most interested in. Some distributions, like FreeBSD, prefer not to cherry-pick, i.e., they either take a new version of a component as a whole, or do not update to it.\n\n3. Updating Customization involves revisiting the customizations (patches) performed on earlier versions of the packaged component (e.g., the initial New Package or later Local Patch activities). Maintainers typically submit these patches upstream, to be merged. As a consequence, some patches no longer need to be maintained locally and can be discarded by the maintainer. Other patches, however, need to be updated by the maintainer to be cleanly applied to the new version of the upstream package. Just like task 2, this requires manual analysis of the patch and the new package-version.\n\n4. Updating the Package\u2019s Metadata, cf. task 3 of New Package.\n5. Integration Testing, cf. task 4 of New Package.\n6. Publishing the Package, cf. task 5 of New Package.\n\nParticipants: maintainer and upstream developer.\n\nInteractions: Upstream Sync is a pivotal activity that can be accompanied by any other activity, except for New Package (by definition). Upstream Sync occurs mostly together with Packaging Change, Dependency Management, Local Patch and (in source-based distributions) Product-wide Concern.\n\nLiterature:\nTogether with Local Patch, Upstream Sync is the most discussed integration activity in literature, independent of the type of reuse (COTS/OSS/ISS) or organization (OSS/commercial) (Lewis et al. 2000; Navarrete et al. 2005), and it is the source of most of the issues related to Dependency Management (sometimes even preventing Upstream Sync of other packages). For example, Begel et al. (Begel et al. 2009) report\nthat at Microsoft up to 9% of 775 surveyed engineers rely on other teams to inform them of changes to a component they rely on. Researchers (Merilinna and Matinlassi 2006; de Souza and Redmiles 2008) and practitioners (Koshy 2013) recommend to continuously monitor (or inquire) for new versions and their impact on the software system, even appointing a specific gatekeeper responsible for doing this. This also helps mitigate one of the largest risks of reuse: the component vendor going out of business (Lewis et al. 2000).\n\nSince reuse induces a dependency on the provider of a COTS/OSS/ISS component (who fully controls the component\u2019s evolution (Lewis et al. 2000)), researchers have reported two extreme approaches to deal with this dependency: swiftly updating to each new component version (Brownsword et al. 2000; Stol et al. 2011; Van Der Linden 2009) versus sticking to a particular version and patching it for the organization\u2019s particular needs (Merilinna and Matinlassi 2006; Ruffin and Ebert 2004; Van Der Linden 2009). There is no systematic methodology to decide between the two approaches and hybrid approaches in between like cherry-picking (Lewis et al. 2000), typically personal experience is the deciding factor (Merilinna and Matinlassi 2006), while other factors like the safety-critical nature of a software system can play a role as well (Lewis et al. 2000). Interestingly, many integration issues could in fact be avoided if the new component version would be backwards compatible with the previous version (Crnkovic and Larssom 2002; Stol et al. 2011), but this is outside the control of the organization that reuses a component.\n\nNotable Instances:\n\nA low-risk Upstream Sync: Gnash (Ubuntu) is a Flash player that was updated to upstream version 0.8.7 in March 2010 (#52225410), right at the start of the Ubuntu feature freeze window (i.e., close to the next release). Since new features are technically not allowed in a freeze window, a member of the Ubuntu release team needed to explicitly approve the Upstream Sync. As Gnash is a package inherited from Debian, and the update mostly contained bug fixes, version 0.8.7 quickly got synced.\n\nAn Upstream Sync taking a long time: Krita 2.1.1-1 (Debian), the painting program of the KOffice suite, was broken early May 2010 because one of the libraries it depends on (libkdcraw7) had been replaced by a newer version (libkdcraw8) in an Upstream Sync of KDE 4.4.3 (#580782). Unfortunately, the solution (an Upstream Sync to KOffice 2.2.0), took 2 months because this new version of KOffice introduced too many new functionalities, requiring the package to be tested more thoroughly.\n\nA patch cherry-picked from another distribution: libpt 1.10.10 (Ubuntu), a cross-platform library, relied on the new gspca webcam driver provided by the 2.6.27 Linux kernel. For this driver to work, all programs and libraries consuming the webcam stream now had to load the libv4l wrapper libraries at run-time, forcing 62 Ubuntu packages to be modified. Since three weeks earlier a patch had been uploaded to Fedora (another distribution) to make these changes for libpt, this patch was cherry-picked into Debian (and Ubuntu).\n\nC. Dependency Management\n\nIntent: Keeping track of the dependencies of a package to make sure it can be properly built and run.\n\nMotivation: Packages depend on other packages to be built (e.g., compilers and static libraries) and to be run (e.g., dynamic libraries and services). For example, in our data set, Debian packages containing dynamic libraries have on average 6.4 packages depending\n\n---\n\n10This notation refers to a bug report in the distribution\u2019s bug repository.\non them directly (median: 2.0), and 47.6 transitively (median: 3.0). If a package on which many other packages (\u201creverse-dependencies\u201d) depend changes, for example because of an Upstream Sync, that change might break its reverse-dependencies.\n\nA special case of such a change are \u201clibrary transitions\u201d, i.e., changes to the public interface of a shared library that might force dozens of packages to be rebuilt or, in the worst case, to be adapted to the new interface via source code changes. For example, if the C runtime library would change, all packages using C might need to be changed and/or re-built.\n\n**Major Tasks**\n\n1. **Becoming Aware of Dependency Changes** either happens automatically (see Upstream Sync), or based on an announcement by the maintainer of a dependent package that is about to change significantly. The latter announcement typically is sent to the release manager and any affected maintainers, leaving time to discuss the repercussions of the update. In case such an announcement has not been done, at the very minimum, the maintainer should notice a change in the API through the updated interface version (\u201cSONAME\u201d) of a dynamic library. For example, a dynamic library \u201clibfoo\u201d with interface version 1 would have a SONAME of \u201clibfoo.so.1\u201d. If this SONAME suddenly changed to \u201clibfoo.so.2\u201d upstream, maintainers would know that the API of the component has changed substantially.\n\n2. **Assessing the Risk of a Dependency Change** is similar to task 2 of an Upstream Sync. Determining which and whose packages broke because of a change is largely a manual task, requiring insight into how an API is used by other packages, whose implementation and algorithms are typically unknown to the maintainer. Unfortunately, no tool support is available in practice to assist in this task. Typically, the build logs are checked for errors and the package is driven through a small smoke test scenario.\n\n3. **Fixing the Damage** either happens atomically, i.e., the changed package and all its reverse-dependencies are updated at once (FreeBSD), or interleaved, i.e., each of the packages is updated independently (Debian/Ubuntu). Atomic updates can delay a new package-version as long as not all broken packages have been updated successfully, but at least the end user will not be impacted by inconsistent packages. Distributions like Fedora and Ubuntu use sandbox build environments to atomically update a transitioning library with all its reverse-dependencies in isolation, without affecting other packages (and hence users) (The Fedora Project 2011).\n\nWhether or not the update model is atomic, the maintainer of the library causing the changes is responsible for performing all rebuilds. The maintainer analyses the build and test logs to determine which packages failed to build, and attempts to write patches for those, using her knowledge of the API changes. If this fails, she needs to assist the failing packages\u2019 maintainers to resolve the transition issues, similar to delivery advocates for ISS reuse (Stol et al. 2011). To keep track of which packages have already been re-built, the release manager and maintainers use a tracking system: Ubuntu and Debian both use a custom library transition tracker, while Ubuntu sometimes uses a bug tracker.\n\n4. **Updating the Packages\u2019 Metadata**, cf. task 3 of New Package.\n\n5. **Integration Testing**, cf. task 4 of New Package, once the whole transition is complete (atomic model) or for each updated package separately (interleaved model).\n\n6. **Publishing the Package**, cf. task 5 of New Package.\n\n---\n\n11If the maintainer finds out that the interface did change without a SONAME update, she would contact upstream to ask for an update of the SONAME, then perform an Upstream Sync of the updated library before resuming the Dependency Management of the library\u2019s reverse-dependencies.\nParticipants: maintainers of the changed package and those of its reverse-dependencies, release manager.\n\nInteractions: Dependency Management can be accompanied by any other activity, except for New Package. It occurs mostly together with Upstream Sync, Packaging Change, Local Patch and (in source-based systems) Product-wide Concern.\n\nLiterature:\nSimilar to Upstream Sync, Dependency Management is independent of the kind of reuse and organization. Begel et al. (2009) observed a wide range of mitigation techniques for dependency problems at Microsoft, ranging from minimizing the number of dependencies to explicitly planning backup strategies to deal with dependency issues. Other companies, such as the one studied by de Souza et al. (2004) and de Souza and Redmiles (2008), stressed the importance of vendor-integrator communication to reduce the effort required for \u201cimpact management\u201d of reused APIs. Managers first should build an impact network consisting of people affecting or affected by their component, then use frequent email communication or people assigned explicitly to a particular API (or ISS component (Stol et al. 2011)) to manage forward (i.e., on other teams) and backward (i.e., on their team) dependency impact. Similar to other major companies like Google (Whittaker et al. 2012), as well as the studied OSS distributions, a team is required to inform its clients of major API breakage. de Souza and Redmiles (2008) note, however, that one should not forget the ripple effect of \u201cindirect\u201d (i.e., transitive) dependencies.\n\nSimilar to Upstream Sync, backwards compatibility of dependent packages can avoid many integration issues (Crnkovic and Larssom 2002; Stol et al. 2011). Furthermore, many Dependency Management issues are due to unnecessarily high coupling between components by relying on implementation details (Spinellis et al. 2004) and private APIs (Stol et al. 2011). Hence, using components via explicit (Stol et al. 2011) and stable (Merilinna and Matinlassi 2006) interfaces can avoid many problems. Finally, packaging organizations like distributions can eliminate many dependency issues of their users by providing assemblies (sets) of integrated components instead of individual components. This is why many distributions offer so-called \u201cvirtual\u201d packages, for example to integrate all core packages of Perl, KDE or GNOME.\n\nNotable Instances:\n\nA surprise library transition: A library interface change to the libfm 0.1.14-1 (Debian) file manager library was not announced by the upstream developer. As a consequence, applications built against the old version of libfm (\u201clibfm.so.0\u201d), such as the pcmanfm file manager, broke (#600387). The dynamic linker had no way of knowing that \u201clibfm.so.0\u201d was no longer the original library version all packages were built against, but rather the new version with a different interface that should have been named \u201clibfm.so.1\u201d.\n\nProblems with non-atomic fixes of dependency changes: The transition of Perl 5.10 (Debian), the Perl programming language ecosystem, to Perl 5.12 at the end of April 2011 (#619117) took slightly over two weeks, during which over 400 packages (directly or indirectly depending on Perl), including high-profile ones such as vim, subversion, rxvt-unicode and GNOME, were not installable from the staging area until all their dependencies were rebuilt consistently against Perl 5.12.\n\nA dependency change requiring only a rebuild: The chances of acceptance for Boost 1.34.1 (Ubuntu), a general-purpose C++ library, in Ubuntu 7.10 looked slim, since Ubuntu had just entered its \u201cFeature Freeze\u201d (only bug fixes were still accepted for the upcoming release) and all Boost\u2019s reverse-dependencies had to be updated. However, the contributor\nchampioning the new Boost release was able to convey the urgency of the release (fixes to show-stopper bugs) and the package maintainer verified that all reverse-dependencies could just be rebuilt without source code changes.\n\nD. Packaging Change\n\n**Intent:** Changing the packaging logic or metadata to fix packaging bugs, to follow new packaging guidelines or to change the default configuration, either for binary or source packages.\n\n**Motivation:** The packaging process combines the build process (McIntosh et al. 2011) of an upstream component with the dependency management and packaging machinery of a distribution. Hence, understanding the packaging process is not a trivial process, and bugs slip in frequently. Furthermore, as the packaged component evolves, its packaging requirements evolve as well. For example, new features might have been added that need to be configured in the package. The Packaging Change activity covers any such changes to the packaging, building and installation logic and metadata of a package.\n\n**Major Tasks:**\n\n1. **Replicating Reported Problems** is a prerequisite in order to fix a packaging problem. Ideally, the maintainer would like to clone the packaging environment of a bug reporter, or at least have a complete description of the build platform, all installed libraries and their versions. Tools exist to generate such a description when submitting bug reports, yet inexperienced bug reporters often do not know or forget to use those.\n\n2. **Understanding the Build and Packaging Process** is a necessity in order to be able to fix packaging bugs or enhance the packaging logic. Such understanding currently is based on interpreting the build and execution logs of packages. Furthermore, trial-and-error is commonly used when changing the packaging logic. Since there is no dedicated way to test build and packaging changes, the maintainer verifies the correctness of those changes by manually installing the package and running the unit or user tests of the package.\n\n3. **Integration Testing**, cf. task 4 of New Package.\n\n4. **Publishing the Package**, cf. task 5 of New Package.\n\n**Participants:** maintainer, package community (for testing), expert end-user.\n\n**Interactions:** This activity is performed during most of the other activities, such as New Package and Upstream Sync. Frequently, this activity requires a Local Patch.\n\n**Literature:**\n\nThe Packaging Change activity has not been discussed thoroughly in prior research, except for the well-known difficulty of configuring COTS/OSS/ISS components (Stol et al. 2011). Such configuration issues are due to the fact that, by default, components need to be generic and contain many features, whereas a specific integrator only needs some of those. The need to adapt packaging logic is specific to the domain of packaging organizations (of which OSS distributions are a subset), since they are a mediator between upstream components and final users, and hence require upstream components to fit into their own package management system.\n\n**Notable Instances:**\n\n**A package with missing files:** The librt shared library implementing the POSIX Advanced Realtime specification had been dropped without warning from the GNU standard C library on Debian (libc6 2.3.6-18), breaking the XFS file system package (#381881). To resolve this case of Dependency Management for XFS, a Packaging Change was made to libc6\u2019s package metadata to indicate that librt was no longer provided.\nBroken packaging because of changed guidelines: Versions 2.6 to 3.2 of Python (Ubuntu), the Python programming language ecosystem, suddenly failed to build on Ubuntu (#738213) because essential libraries like libdb and zlib on which python depended could not be found anymore on the build platform. The change in directory layout was a result of the work on enabling 32 and 64 bit versions of libraries to be installed on a single machine.\n\nBroken packaging because of upstream changes: The GNU Octave (FreeBSD) developers changed the layout of their web site as well as the build logic of some of their projects (#144512). The maintainer had to fix the code fetching script and refactor the existing build script shared by all GNU Octave ports into separate scripts for the individual ports.\n\nE. Product-wide Concern\n\nIntent: Applying product-wide policies and strategic decisions to the integrated packages.\n\nMotivation: Since a distribution integrates thousands of packages, there are important rules and strategic decisions that should be followed in order to make the distribution coherent and consistent. For example, a new standard for package help files should be adopted by all packages, either all at once or at their own pace. Similarly, strategic decisions to transition to a new version of a core library or to move to a new default window manager should be followed up as uniformly as possible by all involved packages.\n\nMajor Tasks:\n1. Determining Ownership and Timing of Changes happens through discussions between the co-ordinator (release manager or a volunteer) of the product-wide concern and the affected maintainers. The co-ordinator notifies all affected package maintainers about the decision, explaining the motivation of the Product-wide Concern, the end goal and the different steps involved in getting there. Those steps depend on the enforcement strategy in use.\n\n2. Enforcing the Concern happens either through centralized or distributed enforcement. With centralized enforcement, the Product-wide Concern co-ordinator applies the concern\u2019s changes herself on all affected packages at once. Maintainers only need to test if their package still works and report a bug if it does not. With distributed enforcement, the package maintainers, briefed by the co-ordinator, are in charge of the change for their own package. This gives them the freedom to implement a Product-wide Concern as they see fit, but might delay updates to their packages\u2019 reverse-dependencies. While the concern is being enforced, the co-ordinator continuously monitors the status of the concern via dashboards, mailing lists and/or bug reporting systems.\n\nDebian uses distributed enforcement, FreeBSD uses centralized enforcement and Ubuntu uses both. Derived distributions like Ubuntu automatically leverage Product-wide Concern changes performed by the contributors of the parent distribution. FreeBSD co-ordinators use regular expressions to change the packaging logic of hundreds of ports at once, thanks to the strict naming conventions in the packaging logic. Given the high risk of such product-wide changes in FreeBSD, the co-ordinator needs approval by the release manager, after which the whole distribution is rebuilt on the distribution\u2019s build cluster to check the effects of the product-wide change.\n\n3. Integration Testing, cf. task 4 of New Package.\n\n4. Publishing the Package, cf. task 5 of New Package.\n\nParticipants: maintainer, co-ordinator, release manager.\n\nInteractions: Product-wide Concern is typically accompanied by Dependency Management, Upstream Sync or Packaging Change.\nLiterature:\nSimilar to Packaging Change, Product-wide Concern is a relatively unknown activity. For example, Curtis et al. (1988) identify the issue that \u201cProjects must be aligned with company goals and [that they] are affected by corporate politics, culture, and procedures\u201d, and they stress that the \u201cinter-team group dynamics\u201d (between an integrator and upstream) significantly complicates the already complex \u201cintra-team group dynamics\u201d. However, no concrete advice or discussion of the tasks involved are provided, especially not in the context of multi-component integration at the scale of OSS distributions (thousands of integrated components).\n\nNotable Instances:\nThe massive migration to GCC 4 (Debian) in July 2005 is an example of a Product-wide Concern with distributed enforcement. Since the compiler suite broke C++ programs compiled with earlier GCC versions, all C++ packages using GCC had to be rebuilt. An approach typically followed in cases like this,\\textsuperscript{12,13} is to (permanently) rename the packages after rebuilding by attaching a suffix like \u201c+b2\u201d. This ensures the visibility of rebuilt packages, enabling other packages to explicitly depend on the rebuilt versions.\n\nThe migration to Dash as the default command shell in Ubuntu 6.10 (October 2006) and Debian Lenny (February 2009) illustrates the differences between centralized and distributed enforcement. The Ubuntu co-ordinator instantaneously made Dash the default shell, breaking many packages\u2019 scripts and build files (centralized). Although several users were enraged, the co-ordinator consistently referred to the maintainers and upstream developers of the failing packages to fix incompatible Bash-specific code (\u201cbashisms\u201d). A web site with official migration strategies and workarounds was provided.\n\nWhen Debian discussed their move to Dash (independently from the Ubuntu move),\\textsuperscript{14} the Ubuntu co-ordinator convinced them about the importance of clear release goals and communication with all stakeholders. The Debian developers then built tools to screen all packages for known bashisms. Maintainers of packages containing bashisms were notified by email and requested to fix the bashisms by a certain date (distributed).\n\nF. Local Patch\n\nIntent: Maintaining local fixes and/or customizations to a package.\n\nMotivation: Integrators and their users will find bugs in packages. Some of these bugs are package-specific, while others are due to the integration of the package in the distribution. Typically, maintainers are encouraged to send the fixes for both kinds of bugs upstream, such that the upstream project will take ownership of the code (and its maintenance) and include it by default in their project. In practice, however, many integration bug fixes are not accepted by upstream (or take time to be adopted) and tend to end up as local patches that need to be maintained by the integrator and re-applied by the integrator upon each Upstream Sync. The same holds for customization changes specific to a distribution, for example because of Product-wide Concern.\n\nMajor Tasks:\n1. Getting a Local Patch Accepted Upstream requires a patch that fixes the bug in a clean way and follows the programming guidelines of the upstream developers. After thorough\n\n\\textsuperscript{12}http://bit.ly/FOCJHf\n\\textsuperscript{13}http://lwn.net/Articles/160330/\n\\textsuperscript{14}http://bit.ly/z3ORxT\ntesting, the maintainer submits the patch to the preferred bug reporting system of the upstream project. The report should be as detailed as possible, making clear what bug is fixed, in which version of the project, and what the impact is on the users of the distribution. Either the patch is accepted in a reasonable period of time, or it is not. If accepted, the maintainer can discard his Local Patch. Otherwise, the maintainer is responsible for maintaining and re-applying the Local Patch across all future versions of the package.\n\n2. Maintaining the Patch upon an Upstream Sync is the maintainer\u2019s responsibility until the Local Patch is accepted by upstream (if ever), cf. task 3 of Upstream Sync. As such, Local Patch is a very common activity, involving 22.1 \u00b1 5% (Debian), 28.4 \u00b1 5% (Ubuntu) and 12.2 \u00b1 5% (FreeBSD) of all package-versions. Of these versions, only 7 \u00b1 5% (Debian), 0.3 \u00b1 5% (Ubuntu) and 0 \u00b1 5% (FreeBSD) had to update an existing Local Patch, whereas 24.7 \u00b1 5% (Debian), 11.9 \u00b1 5% (Ubuntu) and 6.3 \u00b1 5% (FreeBSD) could stop maintaining the Local Patch because it was included into a new upstream version. To keep track of local patches, Debian-based distributions use patch management systems such as \u201cquilt\u201d, \u201cdpatch\u201d and \u201cgit\u201d, while FreeBSD maintainers manage patches manually.\n\n3. Updating the Package\u2019s Metadata, cf. task 3 of New Package.\n4. Integration Testing, cf. task 4 of New Package.\n5. Publishing the Package, cf. task 5 of New Package.\n\nParticipants: maintainer, upstream developer, bug reporter.\n\nInteractions: Local Patch is typically accompanied by Upstream Sync, Packaging Change, or Dependency Management.\n\nLiterature:\nThe paradox of on the one hand having to submit a patch upstream to avoid maintenance but on the other hand having a hard time getting the patch accepted, is the most studied integration challenge in the literature, across different kinds of reuse and organizations (Bac et al. 2005; Brownsworth et al. 2000; Merilinna and Matinlassi 2006; Spinellis et al. 2004; Stol et al. 2011). No silver bullet exists, although, similar to Upstream Sync and Dependency Management, close collaboration of an organization with the upstream project is generally recommended (Stol et al. 2011), even in the case of COTS (Morisio et al. 2002). However, such a collaboration takes a lot of time, effort and goodwill, and also does not guarantee that the upstream project will accept and maintain the patch (Ven and Mannaert 2008). In fact, it often happens that even an accepted patch still needs to be maintained by the downstream organization (since the organization has the required expertise) (Jaaksi 2007).\n\nAn opposite approach has been successful in the case of ISS, where the ISS team reaches out to the teams that reuse its components to help them with integration (Stol et al. 2011). Alternatively, one could use COTS-style glue or wrapper code to avoid changing the actual code altogether (Di Giacomo 2005; Van Der Linden 2009). However, such approaches are less powerful (one loses the benefits of OSS/ISS) and still require maintenance. As a kind of middle ground, many organizations use packaging organizations like OSS distributions as a maintenance buffer between upstream and themselves (Merilinna and Matinlassi 2006), shifting the problem to the distributions. In the presence of sufficient industrial partners, one could even consider making an independent fork of an upstream component, but this is quite costly and in the end not that successful in practice (Ven and Mannaert 2008). Note that patches for local usage or configuration will never be picked up upstream, hence require eternal maintenance. This applies especially to end-users, who might have local patches on top of a distribution\u2019s package.\nNotable Instance:\n\nA patch that is quickly adopted upstream: The Debian and Ubuntu packages of the GNOME sensors-applet (Debian/Ubuntu) desktop widget for temperature and other sensors featured \u201cugly, outdated icons\u201d (#69800) because the newer icons did not comply with the license policy of Debian and Ubuntu. To fix this, the Ubuntu maintainer built a local patch on top of the Debian package to use the newer icons in Ubuntu, while the upstream developer contacted the icon designer to make the new icons compatible with Debian by adding an additional license to the icons (an example of the \u201cDisjunctive\u201d legal pattern (German and Hassan 2009)). The designer complied, and the Ubuntu maintainer reported the license change to the Debian maintainer, such that he could drop his Local Patch.\n\nA Local Patch can cause havoc: A notorious security hole in the OpenSSL Debian package (an implementation of the SSL/TLS protocols) was introduced into Debian by a local patch and lasted from May 2006 until May 2008. A call to the function adding randomness to a cryptographic key had accidentally been commented out by a Local Patch (#363516). The Debian maintainer had contacted upstream, but did not fully disclose himself, nor his plans, and was largely ignored. The patch was never sent upstream for inclusion afterwards. To complicate the issue further, the address of the mailing list contacted by Debian was not the real OpenSSL development list, since that one was hidden from non-developers. This security hole propagated to over 44 derived distributions, without any of the maintainers or contributors involved identifying the bug.\n\nG. Maintainer Transfer\n\nIntent:\nMaintaining a package if the maintainer is absent, unwilling or incapable to further maintain a package.\n\nMotivation:\nBeing a package maintainer is a major responsibility, since it requires mediating between upstream projects and the end-user, typically for multiple packages at a time. However, maintainers may have periods during which they cannot spend the required time on integration, they may lose interest in certain packages, or they could just become unresponsive to bug reports or user requests. In the worst case, a package could even be orphaned when the maintainer quits. To prevent packages (and any product based on it (Van Der Linden 2009)) from stalling, OSS distributions need to provide a means to keep packages evolving, while bypassing or overriding a maintainer.\n\nMajor Tasks:\n1. Overriding the Maintainer depends on how a distribution organizes package ownership. If package maintenance is shared across all distribution developers collectively, the concept of overriding a maintainer is not relevant. In Ubuntu, for example, packages in the commercially supported Main and Restricted archives are managed by a team known as Core Developers, whereas the packages in the commercially unsupported Universe and Multiverse archives are supported by the community under the guidance of a team known as \u201cMasters Of The Universe\u201d (MOTU). Any developer can modify any package, as long as it is managed by the developer\u2019s collective and the change does not introduce unnecessary\n\n15http://lwn.net/Articles/282038/\n16http://bit.ly/w7rn04\n17http://www.links.org/?p=327\ndivergences compared to upstream. In case of disagreement amongst developers, there are conflict resolution procedures in place, but those rarely need to be used.\n\nDistributions with individual package ownership, on the other hand, need a Maintainer Transfer policy to take over the role of a maintainer if she becomes unresponsive or disappears altogether. A contributor proposing an Upstream Sync, Dependency Management, Infrastructure Change or a Local Patch that fulfils certain criteria can explicitly mark her change as a Maintainer Transfer. In Debian, for example, this is called a \u201cNon-Maintainer Upload\u201d (NMU), and is only valid for changes that fix an important, known bug. Debian provides the \u201cnmudiff\u201d tool to help contributors submit NMUs.\n\nThe unique property of a Maintainer Transfer change is that a timer is attached to it, with a delay depending on the severity of the proposed change (e.g., FreeBSD typically uses a delay of 2 weeks). Unless the maintainer replies to the change on time, the change is set to go in automatically once the timer expires. If the maintainer replies on time, she can request suspending the timer in order to review the change. If not approved, the contributor needs to revise the change corresponding to the maintainer\u2019s comments.\n\nWe found that 5.7 \u00b1 5 % (Debian) and 2.9 \u00b1 5 % (FreeBSD) of all package-versions contain an instance of Maintainer Transfer (Ubuntu has collective package ownership, hence does not have such transfers). The min/median/max number of days until such changes were accepted is 0/1.5/556 days for Debian and 1/16/465 days for FreeBSD. In Debian, the median value is very low, indicating that maintainers often commit a Maintainer Transfer before the timer goes off. In FreeBSD, time-outs are much more common. The cases with maximum time-out in Debian (#325110) and FreeBSD (#140303) correspond to packages that temporarily were orphaned, i.e., the maintainer officially stepped down.\n\n2. Supporting Orphaned Packages is typically done by an ad hoc team of volunteers, based on casual contributions or reported critical bugs. In Debian, the QA team typically jumps in to make changes to orphaned packages.\n\n3. Adopting Orphaned Packages either happens by volunteers interested in an orphaned package, or by convention, when a contributor provides patches for an orphaned package and automatically becomes the new maintainer. For example, if no feedback is received for a patch in FreeBSD within three months, the maintainer is deemed to have abandoned the package and any contributor may assume maintainership (The FreeBSD Documentation Project 2011, Section 5.5).\n\nParticipants: maintainer, contributor.\n\nInteractions: Maintainer Transfer can co-occur with all other activities, except for New Package.\n\nLiterature:\nWe could not find any reference to the Maintainer Transfer activity in literature. However, Curtis et al. (1988) and Lewis et al. (2000) do stress the importance of having \u201csystem-level thinkers\u201d as maintainers, who are able to sufficiently understand both the specific domain of the integrated component as well as the overall architecture of their own system. According to our analysis, the Maintainer Transfer activity would kick in as soon as the maintainer of a component would not possess those skills.\n\nNotable Instances:\nAn NMU helping out a busy maintainer: httrack 3.40.4-3.1 (Debian), an offline browser, fixed an issue with the file system locations for test files. The bug was reported on the 11th of October 2006, followed one week later by a proposed NMU by a contributor. A couple\nof hours later the NMU was approved by the maintainer, who noted (#392419): \u201cThanks a lot, I didn\u2019t yet had [sic] the change [sic] to review the issue\u201d.\n\n**An NMU with strings attached:** The maintainer of *libcdio 0.78.2+dfsg1-2.1 (Debian)*, a library for accessing CD media, had been warned on the 20th of January 2008 about C++ header file issues with the upcoming release of GCC 4.3 (*Product-wide Concern*). Two months later, a contributor sent in an NMU patch fixing the compiler errors. One day later, the maintainer chimes in (#461683): \u201cI don\u2019t object to a NMU (I know I haven\u2019t been handling my libcdio package in the best possible way), but if you wish to NMU, please consider applying the patches that were sent to other bug reports\u201d. The NMU was approved the same day.\n\n**A hostile NMU:** On the 18th of May 2007, a contributor requested an *Upstream Sync* to the new upstream release (1.3.2) of *libjcalendar-java 1.2.2-6.1 (Debian)*, a calendar picker component, and also proposed a *Packaging Change* to support the Kaffe Java VM. However, since nothing happened for one week, the contributor added a comment to both bug reports stating \u201cI am planning a NMU if nothing happens (again)\u201d (#424981, #424982). The next day, the maintainer replied (#424981) \u201cI admit that I\u2019m not very reactive, but before you do your NMU, have you checked that Jcalendar 1.3.2 is backwards compatible with version 1.2?\u201d. Nothing happened for 1.5 months, until the NMU timer had expired and the NMU went in.\n\n### 5 Identified Integration Challenges\n\nThe seven discussed integration activities document the complexity of integration. Even in the simplest case, i.e., black box integration, maintainers still need to package the integrated project (*New Package*), verify if the integrated product is compatible with each *Upstream Sync*, and follow up on *Dependency Management* changes like library transitions. In the case of white box integration, the integrated projects need to be customized or fixed with *Local Patches*, and streamlined to product-wide policies (*Product-wide Changes*). All the time, the packaging logic and configuration files need to be kept up-to-date (*Packaging Change*), and maintainer activity needs to be monitored (*Maintainer Transfer*).\n\nTo paraphrase Curtis et al. (1988), we \u201care not claiming to have discovered new insights\u201d for OSS integration, instead we identified and documented the core integration activities that the maintainers of three large OSS distributions perform on a daily basis \u201cto help identify which factors must be attacked to improve\u201d integration. Although distributions have guidelines on how to address some of these activities (Debian project 2011; The FreeBSD Documentation Project 2011), the differences in terminology (e.g., \u201cNMU\u201d vs. \u201ctime-out\u201d) and technical procedures (e.g., centralized vs. distributed *Product-wide Concern*) make it confusing to understand and compare the activities, or to study possible tools and techniques to improve these activities. Hence, the unifying vocabulary that we provide is key to understand the integrating process of upstream components, complementing existing work on code integration (Coplien et al. 1998; DeLine 1999; Frakes and Kang 2005; Parnas 1976; Pohl et al. 2005) and on selection of reusable components (Bhuta et al. 2007; Chen et al. 2008; Li et al. 2009). Finally, we also compared the activities to those in prior work, in particular in commercial settings.\n\nThroughout our analyses and the documentation of the 7 integration activities, we distilled 13 concrete challenges summarized in Table 4, across four different research areas. Most of the challenges have been discussed earlier in this paper. Ubuntu and Debian are\nTable 4 Open challenges for integration activities\n\n| Area | Challenge |\n|--------|---------------------------------------------------------------------------|\n| packaging | \u00b7 insight into upstream build process |\n| | \u00b7 automatic build-/run-time dependency extraction |\n| | \u00b7 accurate replication of packaging environment |\n| testing | \u00b7 cross-platform testing of package & its dependencies |\n| | \u00b7 integration testing during packaging |\n| | \u00b7 accurate replication of functionality issues |\n| evolution | \u00b7 determining best moment for Upstream Sync |\n| | \u00b7 insight into upstream changes |\n| | \u00b7 recommendations about important API changes |\n| | \u00b7 management of ownership of package changes |\n| merging | \u00b7 prediction of integration defects |\n| | \u00b7 identifying opportunities for cherry-picking |\n| | \u00b7 insight into merge status of Local Patches |\n\ncurrently in the process of designing an automatic unit and integration testing system for the packaging process. Similar to defect prediction work at the code level, prediction of integration defects and the effort involved with fixing these defects would be extremely useful. There is some initial work on this (Mohamed et al. 2008; Yakimovich et al. 1999), but more work is needed to bring such techniques to practitioners. Similarly, a kind of bugzilla repository for managing ownership of changes, i.e., who should update reverse-dependencies, who should perform a Product-wide Concern or who should act on an NMU, is needed to improve communication across all involved parties. Insight into the upstream build process (Adams et al. 2007; Qiang and Godfrey 2001) currently relies on manual tracing and analysis of build and run-time logs, with only some packages having rudimentary scripts for checking runtime dependencies. In general, however, the ability to accurately replicate bugs in code and build is missing. Packaging environments can vary widely between users, with certain combinations of package and distribution versions causing subtle packaging or run-time problems. Current bug reporting tools automatically include detailed platform information, yet such information is often insufficient to identify Dependency Management changes.\n\nAs the above challenges impact even three of the largest and most popular OSS distributions, more powerful tool and process support is essential for most of the OSS integration activities, complementing the mailing lists, bug repositories, and custom dashboards (for example to track library transitions) currently in use by organizations. Until now, researchers have only been studying some of the challenges, such as API changes (Dagenais and Robillard 2008) and merge defects (Brun et al. 2011; Shihab et al. 2012). Clearly, more research is needed to support maintainers in the field.\n\n6 Evaluation\n\nThe six contacted maintainers pointed out some small factual errors in an earlier version of the documented integration activities, and recent advances (e.g., regarding the automatic test systems being built for Debian and Ubuntu). However, no fundamental errors were\nidentified, nor was any activity discarded. The identified inaccuracies have been fixed in the activity descriptions above.\n\nRegarding the completeness and usefulness of the documented activities, Table 5 summarizes the replies of the six contacted maintainers. As explained in Section 3.6, two maintainers (M2 and M5) provided empty replies for at least two questions, while M1 left one question open. Hence, we obtained some empty replies for Q2, Q3 and Q5. We now discuss each question\u2019s answers.\n\n**Q1. What activities did we miss?** Five of the maintainers pointed out missing activities, although many of them were captured in some form.\n\nA. \u201cUpstream Lobbying\u201d was in fact mentioned as part of Local Patch, but M4 found that it deserved its own activity. Interestingly, M6 mentioned the inverse kind of lobbying, i.e., lobbying in derived distributions for newly reported or fixed bugs. Instead of splitting up Local Patch, we decided to keep this activity as is, but add more detail about the lobbying part.\n\nB. \u201cPost-release Maintenance\u201d was suggested by M4 and M2 as a dedicated integration activity encompassing all the activities occurring after a new package-version has made it into a new release of the distribution. M4 notes that \u201cwhile the maintainer isn\u2019t required to support the use of a product they [sic] are often the first person contacted if someone can\u2019t get to build on FreeBSD\u201d. Our activities do not capture this activity by itself, only its outcome, for example in the form of a Packaging Change or Local Patch. This is because many emails could be exchanged regarding a maintenance problem without a\n\n| | M1 | M2 | M3 |\n|---|-----------------------------------------|-----------------------------------------|-----------------------------------------|\n| Q1| license/copyright analysis | vulnerability resolution | no |\n| | | post-release maintenance | |\n| Q2| people unfamiliar with topic | < no reply > | major activities ... |\n| | | | ... in easy-to-read way |\n| Q3| < no reply > | < no reply > | more detail/examples |\n| Q4| license tracking | none | none |\n| Q5| DEP5/CDBS license checking | < no reply > | automated testing |\n| | autom. dep. checking | | autom. dep. checking |\n\n| | M4 | M5 | M6 |\n|---|-----------------------------------------|-----------------------------------------|-----------------------------------------|\n| Q1| upstream lobbying | package end-of-life | monitoring downstream ... |\n| | post-release maintenance | | ... distributions for bugs/patches |\n| Q2| useful overview | < no reply > | nice intro to what ... |\n| | do we document our activities? | | ... being distro dev is about |\n| Q3| what to do? | nothing | none |\n| Q4| timely integration | none | monitoring the status ... |\n| | desktop vs. enterprise | | ... of all packages ... |\n| | hundreds of variants | | ... in the distribution |\n| Q5| good question :\u2212) | < no reply > | improvements to package process |\n| | | | atomic package updates |\ncorresponding change log item or bug report (i.e., our data set does not capture such discussions). Although this hints at less important integration issues (since they did not need to be fixed or acted upon in some form), future work should analyze the mailing list data of the distributions to uncover this part of the integration work.\n\nC. \u201cLicense/Copyright Analysis\u201d was mentioned by M1 as an important activity: \u201ccopyright/licensing analysis isn\u2019t mentioned anywhere, yet it\u2019s often a tiresome process when creating a new package (and often forgot [sic] to update on upstream sync)\u201d. License analysis did not occur very often in our data set, for example in our Ubuntu samples we only found one occurrence (version \u201c0.4 \u2212 0Ubuntu1\u201d of package \u201cbranding-ubuntu\u201d), in which case the license of some files had not been specified as being GPL. For this reason, the activity is captured in our Other category.\n\nD. \u201cVulnerability Resolution\u201d was pointed out by M2 as a missing activity, i.e., the steps performed to address a vulnerability in a timely manner after release. Although it is not one of the top 7 activities (and hence not documented in detail by us), vulnerability resolution occurred relatively often (Table 3), occurring in 4.4 \u00b1 5% (Debian), 1.3 \u00b1 5% (Ubuntu) and 0.8 \u00b1 5% (FreeBSD) of all package-versions. Our data shows how most of these vulnerabilities were reported and fixed upstream. Similar to Upstream Sync, distributions first have to become aware of vulnerabilities, then update their packages as soon as a fix is available.\n\nFor this reason, vulnerability changes tend to use NMUs (see Maintainer Transfer), since the security team wants to update a vulnerable package as soon as possible, overruling the maintainer if necessary. Often, vulnerability fixes are cherry-picked, leaving other upstream changes until the next official Upstream Sync. For example, cups-base revision 1.44 (FreeBSD) (24th of January 2005) fixed a vulnerability in the Cups printer server identified and reported upstream by a university student, while php4 4:4.4.0-3ubuntu1 (Ubuntu) cherry-picked 8 upstream vulnerability fixes for the php programming language (19th of December, 2005). Since the full details of vulnerabilities and how they were processed internally are not available in publicly available databases, and since it is less common than the seven documented activities, detailed analysis of this integration activity is future work.\n\nE. \u201cPackage End-of-life\u201d was a missing and often overlooked activity according to M5. Some packages lose user and maintainer interest over time, hence when the distribution evolves and integration activities need to be performed on the package, either nobody steps up or substantial effort is required by other maintainers to keep the package up-to-date. Similarly, if an older version of a library is rendered obsolete by a newer one, or the older version starts to create conflicts with the newer one, the older version needs to be removed from the distribution. However, we did not find evidence of this activity in our data samples. Our Maintainer Transfer activity comes closest, since this one occurs when an unmaintained package is \u201csaved\u201d from end-of-life by a new maintainer.\n\nSurprisingly, the Internationalization activity, which is the ninth most frequent activity that we found (Table 3), was not mentioned by any maintainer. This activity comprises all the work related to translation and adaptation of a package to other cultures (e.g., different currencies) (Xia et al. 2013). Since distributions reach significantly more users than an individual upstream project could reach on its own, a packaged project has a higher chance of being used in non-English locales. Hence, distributions typically have dedicated teams addressing the internationalization needs of their packages.\n\nFor example, the debian-l10n-english team works on the translation templates of packages to facilitate the job of translators (who are often not software engineering experts).\nDistributions typically solicit Internationalization patches once development has been frozen, i.e., the basic new functionality has been stabilized and only bug fixes are still allowed. Although Internationalization changes are typically harmless, they can in rare cases keep packages from executing. In January 2006, for example, an incomplete Japanese character prevented the xchat IRC client of FreeBSD from executing. A 1-character fix in a translation template fixed this issue.\n\n**Q2. What can the documented activities be used for?** M1, M3 and M4 agree that the documented patterns provide a clear overview of the major integration activities, which is useful for novices (M1) as well as any stakeholder involved in integration (M3/M4). M4 noted that the activities do not necessarily need to be used as direct documentation. They could also be used to check how well the distribution collects data or monitors the progress of each integration activity. M3 informed us that the structured, accessible explanations of the major integration activities piqued the interest of two of his package testers, which he believes to be a success. M6 recommended us to \u201creach out to developers communities with this documentation. E.g., you could write a blog post providing an introduction to your paper, targeted at distribution devs\u201d. We are planning to follow up on this suggestion.\n\n**Q3. What is missing from the documented activities?** M3 was interested in getting more details and examples for each activity, while M4 wanted to know what the recommended practices and tools for each activity are. Our documented activities on purpose describe only the major tasks and how they are implemented in the three considered distributions, without a dedicated section for \u201cbest practices\u201d. Given the many challenges identified in Section 5 as well as in Section 2, many activities rely on manual work, and hence do not yet have best practices.\n\n**Q4. What challenges did we miss?** M1 again mentioned license tracking. M4 noted that the largest challenge is not how to perform each activity, but how to perform them on time. Given the ever shorter time frame in between releases (Hertzog 2011; Remnant 2011; Shuttleworth 2008), this is indeed an important constraint on the identified challenges. Furthermore, the right activity to do on a particular moment also depends on the end-user: \u201cdesktop users want updates ASAP while enterprise users don\u2019t want to change their software for multiple years\u201d. This echoes known phenomena such as Microsoft\u2019s monthly \u201cpatch Tuesday\u201d (Lemos 2003) and Mozilla\u2019s extended support releases for companies (Khomh et al. 2012). M4 concluded by warning for the challenges represented by the hundreds of variations in build systems, versioning schemes, projects, etc. Slightly related to this, M6 noted that \u201csomething orthogonal is the management of a large amount of software packages: getting a global overview from their status is not easy\u201d. This ties into the management-related challenges of Table 4 identified from our data.\n\n**Q5. What promising tools/techniques do you see coming up to address some of the challenges?** Both M1 and M3 expect automated dependency checking tools to become mainstream, i.e., \u201cIt may take some time to make that automatic but we are getting closer every day\u201d. Such tools would improve at least the Upstream Sync and Dependency Management activities. M1 mentioned two promising license analysis tools, while M3 remarked that \u201cWe already have automated testing tools in Ubuntu (see QA team) so we are heading in the right direction here\u201d. M6 saw the advent of atomic Dependency Management and other packaging process improvements as a promising development.\n\nOverall, the six maintainers liked the work and found that the documented activities described their daily activities \u201cquite well\u201d (M6). They would not necessarily use our documented representation of the activities themselves (it is more targeted towards novices), except to systematically check which activities their distribution is not tracking (M4). Some\nmissing important activities were identified, in particular license analysis and tracking of licensing changes, vulnerability resolution and post-release maintenance, as well as some missing challenges (especially time pressure). Finally, some tool support for dependency checking is expected to arrive in the medium term, however many challenges remain open.\n\n7 Threats to Validity\n\nWith respect to construct validity, there are several threats to consider. First, we used the change log messages as a representative record of the maintainers\u2019 activities, based on which important bug reports were identified for in-depth manual analysis and (if necessary) mailing list messages and other kinds of documentation. We did not formally verify the accuracy of these data sources, nor their completeness. Although M6 warned that the log message of the first version of a Debian package does not always mention whether Local Patch has been performed, none of the 4 instances of New Package found suffered from this issue.\n\nThere is no further evidence that suggests that the logs are incorrect: the three analyzed distributions require their maintainers to provide log messages (Debian project 2011; Koshy 2013), since those are the primary input for end users and other maintainers affected by changes to a package. In fact, bug reports and mailing lists form the official means of communication in OSS distributions, together with IRC chat messages. In cases where a bug report identifier was missing (cf. Fig. 4), either the change log item was sufficiently clear or we were able to find a related email message via a web search.\n\nSecond, we only analyzed a subset of the package-versions, and, hence, change logs. To mitigate this threat, we randomly sampled a large enough subset of package-versions to obtain a confidence interval of \u00b15% with a 95% confidence level. Furthermore, the activities that we identified for Ubuntu and FreeBSD did not add any new activity on top of those identified for Debian.\n\nThird, our algorithm for reconstructing \u201cversions\u201d from the FreeBSD CVS commits depends on conventions that are documented by the FreeBSD project, but not explicitly enforced. It is possible that the recovered versions are either too fine-grained (under-approximating the actual number of activities performed for a version) or too coarse-grained (over-approximating). Feedback from the package maintainers confirmed that the algorithm is correct and that deviations from the guidelines should be minimal.\n\nFourth, since we study individual package-versions, our sample could contain multiple versions of some packages, just one version of other packages, and no version at all of the remaining packages. Such an approach is necessary, since large projects like KDE or GNOME involve more integration effort than smaller projects, and hence need to have more weight in our study. In addition, such projects typically also have a larger number of associated packages, which increases their weight further. The risk that this sampling decision biases the observed activities is small, since ecosystems like KDE and GNOME consist of hundreds of different applications and tools, developed by hundreds of developers and packaged by dozens of maintainers. In other words, even inside one such ecosystem, we should still expect a large diversity in integration activities.\n\nRegarding internal validity, as mentioned above we rely on the accuracy and completeness of the logs of each package-version. Even in the event that some activities were not documented in the logs, there is no specific reason to believe that some activities would be less documented than others, hence this effect would cancel itself out across the different activities. For example, Post-release Maintenance was missed in our results, since \u201cunimportant\u201d discussions (i.e., those without explicit bug report or patch attached to\nthem) did not have any trace in the change log and its referenced bug reports, across all three distributions.\n\nFurthermore, the nature of manual classification implies that there might be some misclassifications (both for the activities as well as challenges). To overcome this, the logs were interpreted by two of the authors, both of whom have experience in integration tasks (one of them is a Debian/Kubuntu developer), and they discussed their decisions with each other in order to resolve differences and obtain consensus. These discussions also resolved possible bias introduced by having the first set of tags be derived only by one of the authors. Furthermore, to validate the discovered patterns of integration and open challenges, we reached out to six maintainers/release engineers of Debian, Ubuntu and FreeBSD to evaluate and provide feedback on these patterns. Nonetheless, the quantitative results of this paper (prevalence of each activity) is exploratory only and we do not extrapolate these results.\n\nThe evaluation by the six maintainers was performed entirely via email, since this is the preferred means of communication for maintainers (and bug repositories, as discussed, are not suited). Furthermore, the asynchronous nature of emails provided breathing space to the maintainers and made it easier for them to organize their feedback amongst their voluntary open source activities and day-time job. Even then, we still observed that some of the questions were not addressed. In future work, we might complement asynchronous messages via email with synchronous follow-up via, for example, instant messaging (using IRC).\n\nThe open replies by some of the maintainers, as well as the selection of maintainers for the evaluation, also could introduce bias. M2 provided three open replies, M5 two open replies and M1 one open reply, yielding a total of 6 open replies out of 30 (20%). Due to the distribution of the open replies across the questions, each question obtained at least four concrete replies (two obtained six replies). Furthermore, the open replies are spread across the Debian and Ubuntu maintainers, reducing the overall impact of the missing data even further. Regarding selection bias, all six maintainers were experienced maintainers in their respective OSS distribution, covering a range of different packages according to size and domain.\n\nAn alternative evaluation methodology would have been to first perform a survey or interview, after which the research findings would be empirically analyzed and validated on change log and other data. However, doing this would bias our results to the activities that stakeholders think would be important, not necessarily all important activities that they are actually doing. Some essential activities never would have surfaced.\n\nWith respect to external validity, we have analyzed three of the largest OSS distributions as exemplars of packaging organizations. Since integration is the central activity of OSS distributions, we expect the identified activities to be representative for many of the activities that other packaging organizations would face in the case of OSS reuse. For example, packaging organizations like GNOME and KDE, or even \u201cregular\u201d Java or C++ systems that reuse multiple open source libraries as well have to deal with Upstream Sync (e.g., reusing a new version of log4j), Dependency Management (e.g., adding the dependencies of the new version of log4j) and Local Patch (e.g., customizing the new version of log4j to fix a bug). Nevertheless, manual analysis of other kinds of OSS distributions (e.g., Fedora-based), packaging organizations in general or any organization that performs multi-component integration, is necessary to confirm these conjectures and validate the generalizability of the seven integration activities. Such an analysis might discover new activities, for example in the case of package organizations that do not build products for end-users but rather middleware or frameworks for other companies to build on.\n8 Conclusion\n\nSoftware reuse is a major tenet of software engineering, yet the integration activities that accompany it, be it in a COTS, OSS or ISS context, introduce unforeseen maintenance costs. Since more empirical research is necessary in this area to help organizations reuse components successfully and since most studies thus far focused on integration of individual components and/or non-OSS integration, we performed a large-scale study on three successful OSS distributions, i.e., Debian, Ubuntu and FreeBSD.\n\nAnalysis of a large sample of change log messages, bug reports and other historical integration data resulted in the identification of seven major integration activities, whose processes were documented in a pattern-like fashion to help organizations and researchers understand the responsibilities involved in integration. The activities were shown to be non-trivial and requiring a large amount of effort, and they were validated by six maintainers of the three distributions. Based on the seven documented activities, the major challenges turned out to be related to cherry-picking of safe changes from a new upstream release, the management of dependencies between packages, testing of packages and coordination among maintainers. Models and tools are needed to support these integration activities.\n\nBy providing a unified terminology across distributions and by documenting the integration activities in a structured way, our catalogue of activities enables maintainers of open source distributions, organizations interested in reusing OSS or ISS components, and researchers to better understand the challenges and activities that they face, and to plan policies, tools and methods to address these challenges. Together with other studies on integration, a dedicated training program on integration could be built, aimed at developers and their managers, with the aim of reducing or at least stabilizing maintenance costs caused by integration.\n\nFinally, and very encouragingly, all distribution maintainers that we contacted hope that the documented activities and challenges will inspire researchers to start up a research program in the domain of reuse and integration.\n\nAcknowledgments The authors would like to thank all maintainers and release engineers of Debian, Ubuntu and FreeBSD who participated in our study, either directly (providing feedback on the documented activities), or indirectly (providing insights into the fascinating world of OSS distributions).\n\nReferences\n\nAdams B, De Schutter K, Tromp H, De Meuter W (2007) Design recovery and maintenance of build systems. In: Proceedings of the Intl. Conf. on Soft. Maint, pp 114\u2013123\n\nAdams B, Kavanagh R, Hassan AE, German DM (2015) Replication package. http://mcis.polymtl.ca/publications/2014/integration_oss_distribution_adams_et_al.zip\n\nBac C, Berger O, Deborde V, Hamet B (2005) Why and how-to contribute to libre software when you integrate them into an in-house application? Proceedings of the 1st Intl Conf on Open Source Systems (OSS):113\u2013118\n\nBasili VR, Briand LC, Melo WL (1996) How reuse influences productivity in object-oriented systems. Commun ACM 39(10):104\u2013116\n\nBegel A, Nagappan N, Poile C, Layman L (2009) Coordination in large-scale software teams. In: Proceedings of the 2009 ICSE Workshop on Cooperative and Human Aspects on Software Engineering, CHASE \u201909, pp. 1\u20137, Washington, DC, USA, IEEE Computer Society\n\nBhuta J, Mattmann C, Medvidovic N, Boehm BW (2007) Framework for the Assessment and Selection of Software Components and Connectors in COTS-Based Architectures. In: WICSA, page 6\nInformation Technology Resources Board (1999) Assessing the risks of commercial-off-the-shelf applications. Technical report, ITRB\n\nBoehm B, Abts C (1999) COTS integration: Plug and pray? Computer 32(1):135\u2013138\n\nBowman IT, Holt RC, Brewster NV (1999) Linux as a case study: its extracted software architecture. In: Proceedings of the 21st Intl. Conf. on Software Engineering (ICSE), pp 555\u2013563\n\nBrooks FP, Jr (1995) The Mythical Man-month (Anniversary Ed.) Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA\n\nBrownsword L, Oberndorf T, Sledge CA (2000) Developing new processes for COTS-based systems. IEEE Softw 17(4):48\u201355\n\nBrun Y, Holmes R, Ernst MD, Notkin D (2011) Proactive detection of collaboration conflicts. In: Proceedings of the 19th ACM SIGSOFT Symp. and the 13th European Conf. on Foundations of Software Engineering (ESEC/FSE), pp 168\u2013178\n\nChen W, Li J, Ma J, Conradi R, Ji J, Liu Chunnian (2008) An empirical study on software development with open source components in the Chinese software industry. Softw Process 13:89\u2013100\n\nCochran WG (1963) Sampling Techniques, 2nd edn. John Wiley and Sons, Inc., New York\n\nCoplien J, Hoffman D, Weiss D (1998) Commonality and variability in software engineering. IEEE Softw 15:37\u201345\n\nCrnkovic I, Larssom M (2002) Challenges of component-based development. J Syst Softw 61(3):201\u2013212\n\nCurtis B, Krasner H, Iscoe N (1988) A field study of the software design process for large systems. Commun ACM 31(11):1268\u20131287\n\nDagenais B., Robillard MP (2008) Recommending adaptive changes for framework evolution. In: Proceedings of the 30th Intl. Conf. on Software Engineering (ICSE), pp 481\u2013490\n\nde Souza CRB, Redmiles D, Cheng L-T, Millen D, Patterson J (2004) Sometimes you need to see through walls: A field study of application programming interfaces. In: Proceedings of the 2004 ACM Conference on Computer Supported Cooperative Work, CSCW \u201904, pp. 63\u201371, New York, NY, USA. ACM\n\nde Souza CRB, Redmiles DF (2008) An empirical study of software developers\u2019 management of dependencies and changes. In: Proceedings of the 30th International Conference on Software Engineering, ICSE \u201908, pages 241\u2013250, New York, NY, USA, ACM\n\nProject participants (2013). http://www.debian.org/devel/people\n\nDebian project (2011) Debian Developer\u2019s Reference, 2011 edition\n\nDeLine R (1999) Avoiding packaging mismatch with flexible packaging. In: Proceedings of the 21st Intl. Conf. on Software Engineering (ICSE), pp. 97\u2013106\n\nDeveloper\u2019s Reference Team, Barth A, Di Carlo A, Hertzog R, Nussbaum L, Schwarz C, Jackson I (2011) Debian. The Debian Project\n\nDi Cosmo R, Di Ruscio D, Pelliccione P, Pierantonio A, Zacchiroli S (2011) Supporting software evolution in component-based foss systems. Sci Comput Program 76:1144\u20131160\n\nDi Giacomo P (2005) COTS and open source software components: are they really different on the battlefield? In: Proceedings of the 4th intl. conf. on COTS-Based Software Systems (ICCBSS), pp 301\u2013310\n\nDogguy M, Glondu S, Le Gall S, Zacchiroli S (2010) Enforcing type-safe linking using inter-package relationships. In: Proc. of the 21st Journ\u00e9es Francophones des Langages Applicatifs (JFLA), p. 25p\n\nFrakes W, Terry C (1996) Software reuse: metrics and models. ACM Comput Surv 28(2):415\u2013435\n\nFrakes WB, Kang K (2005) Software reuse research: status and future. IEEE Trans Softw Eng 31:529\u2013536\n\nFreeBSD porter\u2019s handbook (2011). http://bit.ly/FQDPhP\n\nThe freeBSD developers (2013). http://www.freebsd.org/doc/en/articles/contributors/staff-committers.html\n\nGaffney JE, Durek TA (1989) Software reuse \u2013 key to enhanced productivity: some quantitative models. Inf Softw Technol 31(5):258\u2013267\n\nGamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable object-oriented software. Addison-Wesley Longman Publishing Co., Inc., MA\n\nGerman DM, Gonzalez-Barahona JM, Robles G (2007) A model to understand the building and running inter-dependencies of software. In: Proceedings of the 14th Working Conf. on Reverse Engineering (WCRE), pages 140\u2013149\n\nGerman DM, Hassan AE (2009) License integration patterns: addressing license mismatches in component-based development. In: Proceedings of ICSE, pp 188\u2013198\n\nGerman DM, Webber JH, Di Penta M (2010) Lawful software engineering. In: Proceedings of the FSE/SDP wrksh. on Future of Soft. Eng. research (FoSER), pp. 129\u2013132\nGonzalez-Barahona JM, Robles G, Michlmayr M, Amor JJ, German DM (2009) Macro-level software evolution: a case study of a large software compilation. Empirical Softw Engg 14:262\u2013285\n\nGoode S (2005) Something for nothing: management rejection of open source software in australia\u2019s top firms. Inf Manage 42(5):669\u2013681\n\nThe BSD Certification Group (2005) BSD usage survey. Technical report, The BSD Certification Group\n\nHauge \u00d8, Ayala C, Conradi R (2010) Adoption of open source software in software-intensive organizations - a systematic literature review. Inf Softw Technol 52(11):1133\u20131154\n\nHauge \u00d8, S\u00f8rensen C-F, Conradi R (2008) Adoption of open source in the software industry. In: Proceedings of the 4th IFIP WG 2.13 Intl. Conf. on Open Source Systems (OSS), vol 275, pp 211\u2013221\n\nHerbsleb JD, Grinter RE (1999) Splitting the organization and integrating the code: Conway\u2019s law revisited. In: Proceedings of the 21st International Conference on Software Engineering, ICSE \u201999, pp. 85\u201395, New York, NY, USA, ACM\n\nHerbsleb JD, Mockus A, Finholt TA, Grinter RE (2001) An empirical study of global software development: distance and speed. In: Proceedings of the 23rd International Conference on Software Engineering, ICSE \u201901, pp. 81\u201390, Washington, DC, USA, IEEE Computer Society\n\nHertzog R (2011) Towards Debian rolling: my own Debian CUT manifesto. http://raphaelhertzog.com/2011/04/27/towards-debian-rolling-my-own-debian-cut-manifesto/\n\nJaaksi A (2007) Experiences on product development with open source software. In: Proc. of the IFIP Working Group 2.13 on Open Source Soft, volume 234, pp 85\u201396. Springer\n\nKoshy J (2013) Building products with FreeBSD. http://www.freebsd.org/doc/en/articles/building-products/, 2013\n\nKhomh F, Dhaliwal T, Zou Y, Adams B (2012) Do faster releases improve software quality? \u2013 an empirical case study of Mozilla Firefox. In: Proceedings of the 9th IEEE Working Conf. on Mining Software Repositories (MSR), pp 179\u2013188, Zurich, Switzerland\n\nLemos R (2003) Microsoft details new security plan. http://news.cnet.com/Microsoft-details-new-security-plan/2100-1002.3-5088846.html\n\nLewis P, Hyle P, Parrington M, Clark E, Boehm B, Abts C, Manners R (2000) Lessons learned in developing commercial off-the-shelf (COTS) intensive software systems. Technical report, SERC\n\nLi J, Conradi R, Bunse C, Torchiano M, Slyngstad OPN, Morisio M (2009) Development with off-the-shelf components: 10 facts. IEEE Softw 26:80\u201387\n\nLi J, Conradi R, Slyngstad OP, Torchiano M, Morisio M, Bunse C (2008) A state-of-the-practice survey of risk management in development with off-the-shelf software components. IEEE Trans Softw Eng 34:271\u2013286\n\nLi J, Conradi R, Slyngstad OPN, Bunse C, Khan U, Torchiano M, Morisio M (2005) An empirical study on off-the-shelf component usage in industrial projects. In: Proceedings of the 6th intl. conf. on Product Focused Software Process Improvement (PROFES), pp. 54\u201368\n\nvan der Linden FJ, Schmid K, Rommes E (2007) Software product lines in action: the best industrial practice in product line engineering. Springer, Berlin Heidelberg\n\nVan Der Linden F (2009) Applying open source software principles in product lines. Eur J Informa Prof (UPGRADE) 3:32\u201340\n\nLundqvist A (2013) GNU/Linux distribution timeline. http://futurist.se/gldt/\n\nMattsson M, Bosch J, Fayad ME (1999) Framework integration problems, causes, solutions. Commun ACM 42(10):80\u201387\n\nMcCamant S, Ernst MD (2003) Predicting problems caused by component upgrades. In: Proceedings of the Symposium on the Foundations of Software Engineering, pp. 287\u2013296\n\nMcIntosh S, Adams B, Kamei Y, Nguyen T, Hassan AE (2011) An empirical study of build maintenance effort. In: Proceedings of ICSE, pages 141\u2013150\n\nMerilinna J, Matinlassi M (2006) State of the art and practice of opensource component integration. In: Proceedings of the 32nd Conf. on Software Engineering and Advanced Applications (EUROMICRO), pp 170\u2013177\n\nMeyer MH, Lehnerd AP (1997) The power of product platforms. Free Press, New York\n\nMichlmayr M, Hunt F, Probert D (2007) Release management in free software projects: practices and problems. In: Open Source Development, Adoption and Innovation, v. 234, pp. 295\u2013300\n\nMistr\u00edk I, Grundy J, Hoek A, Whitehead J (2010) Collaborative software engineering: challenges and prospects, chapter 19, 1st edn. Springer, Berlin Heidelberg, pp 389\u2013402\n\nMohamed A, Ruhe G, Eberlein A (2008) Optimized mismatch resolution for COTS selection. Softw Process 13(2):157\u2013169\n\nMorisio M, Seaman CB, Basili VR, Parra AT, Kraft SE, Condon SE (2002) COTS-based software development: processes and open issues. J Syst Softw 61(3):189\u2013189\nNavarrete F, Botella P, Franch X (2005) How agile COTS selection methods are (and can be)? In: Proceedings of the 31st EUROMICRO Conference on Software Engineering and Advanced Applications, EUROMICRO \u201905, pp 160\u2013167, Washington, DC, USA. IEEE Computer Society\n\nOrsila H, Geldenhuys J, Ruokonen A, Hammouda I (2008) Update propagation practices in highly reusable open source components. In: Proceedings of the 4th IFIP WG 2.13 Int. Conf. on Open Source Systems (OSS), vol 275, pp 159\u2013170\n\nParnas DL (1976) On the design and development of program families. IEEE Trans Softw Eng 2:1\u20139\n\nPohl Klaus, B\u00f6ckle G, van der Linden FJ (2005) Software product line engineering: foundations, principles and techniques. Springer, New York\n\nRemnant SJ (2011) A new release process for Ubuntu? http://netsplit.com/2011/09/08/new-ubuntu-release-process/\n\nRodin J, Aoki O (2011) Debian New Maintainers\u2019 Guide. The Debian Project\n\nRuffin M, Ebert C (2004) Using open source software in product development: a primer. IEEE Softw 21(1):82\u201386\n\nSadowski BM, Sadowski-Rasters Gaby, Duysters G (2008) Transition of governance in a mature open software source community: evidence from the Debian case. Inf Econ Policy 20(4):323\u2013332\n\nScacchi W, Feller J, Fitzgerald B, Hissam S, Lakhani K (2006) Understanding free/open source software development processes. Softw Process: Improv Pract 11(2)\n\nSeaman CB (1996) Communication costs in code and design reviews: an empirical study. In: Proceedings of the 1996 Conference of the Centre for Advanced Studies on Collaborative Research, CASCON \u201996, pp 34\u2013. IBM Press\n\nShihab E, Bird C, Zimmermann T (2012) The effect of branching strategies on software quality. In: Proceedings of the ACM/IEEE intl. symp. on Empirical Software Engineering and Measurement (ESEM), pp 301\u2013310\n\nShuttleworth M (2008) The art of release. http://www.markshuttleworth.com/archives/146\n\nSojer M, Henkel J (2010) Code reuse in open source software development: quantitative evidence, drivers, and impediments. J Assoc Inf Syst 11(iss.12)\n\nSpinellis D, Szyperski C, Guest editors\u2019 introduction: how is open source affecting software development? (2004) IEEE Softw 21(1):28\u201333\n\nStol K-J, Babar MA, Avgeriou P, Fitzgerald B (2011) A comparative study of challenges in integrating open source software and inner source software. Inf Softw Technol 53(12):1319\u20131336\n\nSzyperski C (1998) Component software: beyond object-oriented programming. Addison-Wesley Publishing Co., MA\n\nThe Fedora Project (2011) Package update HOWTO. http://fedoraproject.org/wiki/Package_update\n\nThe FreeBSD Documentation Project (2011) FreeBSD Porter\u2019s Handbook. The FreeBSD Foundation\n\nTiangco F, Stockwell A, Sapsford J, Rainer A, Swanton E. (2005) Open-source software in an occupational health application: the case of heales medical ltd. Procs 1:130\u2013134\n\nTrezentos P, Lynce I, Oliveira AL (2010) Apt-pbo: solving the software dependency problem using pseudo-boolean optimization. In: Proceedings of the IEEE/ACM intl. conf. on Automated Software Engineering (ASE), pp. 427\u2013436\n\nQiang T, Godfrey M (2001) The build-time software architecture view. In: Proceedings of ICSM, pp. 398\u2013\n\nMOTU team (2013). https://launchpad.net/%7Emotu/+members\n\nUbuntu core development team (2013). https://launchpad.net/%7Eubuntu-core-dev/+members\n\nUbuntu universe contributors team (2013). https://launchpad.net/universe-contributors/+members\n\nvan der Hoek A, Wolf AL (2003) Software release management for component-based software. Softw Pract Exper 33:77\u201398\n\nVen K, Mannaert H (2008) Challenges and strategies in the use of open source software by independent software vendors. Inf Softw Technol 50(9-10):991\u20131002\n\nWhittaker J, Arbon J, Carollo J (2012) How google tests software. Addison-Wesley Professional, MA\n\nComparison of BSD operating systems (2011). http://en.wikipedia.org/wiki/Comparison_of_BSD_operating_systems\n\nXia X, Lo D, Zhu F, Wang X, Zhou B (2013) Software internationalization and localization: an industrial experience. In: Proceedings of the 18th Intl. Conf. on Engineering of Complex Computer Systems (ICECCS), pp. 222\u2013231\n\nYakimovich D, Bieman JM, Basili VR (1999) Software architecture classification for estimating the cost of COTS integration. In: Proceedings of the 21st Intl. Conf. on Software Engineering (ICSE), pp. 296\u2013302\nBram Adams is an assistant professor at Polytechnique Montr\u00e9al (Canada). He obtained his PhD at the GHSEL lab at Ghent University (Belgium), and was an adjunct assistant professor in the Software Analysis and Intelligence Lab at Queen\u2019s University (Canada). His research interests include software release engineering in general, as well as software integration and software build systems in particular. His work has been published at premier software engineering venues such as TSE, ICSE, FSE, ASE, EMSE, MSR and ICSME. In addition to co-organizing RELENG 2013 to 2015 (and the 1st IEEE SW Special Issue on Release Engineering), he co-organized the PLATE, ACP4IS, MUD and MISS workshops, and the MSR Vision 2020 Summer School. He is PC co-chair of SCAM 2013, SANER 2015 and ICSME 2016.\n\nRyan Kavanagh is a Bachelor of Computing (Honours) student in Computing and Mathematics at Queen\u2019s University. He has been a research assistant at the SAIL lab of Dr. Hassan, at McGill University, and at Microsoft Research Cambridge. Ryan started contributing to Ubuntu and its derived distributions in February 2006 (while being in high school), and in December 2011 he became an official Debian developer. In his spare time, Ryan is an avid piper, with various Canadian titles under his belt.\nAhmed E. Hassan is the Canada Research Chair (CRC) in Software Analytics, and the NSERC/BlackBerry Software Engineering Chair at the School of Computing at Queen\u2019s University, Canada. His research interests include mining software repositories, empirical software engineering, load testing, and log mining. Hassan received a PhD in Computer Science from the University of Waterloo. He spearheaded the creation of the Mining Software Repositories (MSR) conference and its research community. Hassan also serves on the editorial boards of IEEE Transactions on Software Engineering, Springer Journal of Empirical Software Engineering, and Springer Journal of Computing. Contact him at ahmed@cs.queensu.ca.\n\nDaniel German is professor of Computer Science at the University of Victoria. He completed his PhD at the University of Waterloo in 2000. His work spans the areas of mining software repositories, open source, and intellectual property in software engineering.", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/001-adams.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 42, "total-input-tokens": 97789, "total-output-tokens": 34006, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 1950, 1], [1950, 5735, 2], [5735, 9646, 3], [9646, 12686, 4], [12686, 16696, 5], [16696, 20432, 6], [20432, 24267, 7], [24267, 27422, 8], [27422, 30193, 9], [30193, 33212, 10], [33212, 36905, 11], [36905, 39786, 12], [39786, 42230, 13], [42230, 45951, 14], [45951, 49785, 15], [49785, 52995, 16], [52995, 55831, 17], [55831, 59542, 18], [59542, 63007, 19], [63007, 66796, 20], [66796, 70463, 21], [70463, 74329, 22], [74329, 78078, 23], [78078, 81567, 24], [81567, 85169, 25], [85169, 88585, 26], [88585, 92361, 27], [92361, 95613, 28], [95613, 99203, 29], [99203, 102938, 30], [102938, 106536, 31], [106536, 111073, 32], [111073, 115099, 33], [115099, 119184, 34], [119184, 123088, 35], [123088, 127132, 36], [127132, 130737, 37], [130737, 135083, 38], [135083, 139710, 39], [139710, 144015, 40], [144015, 145298, 41], [145298, 146261, 42]]}}
|
|
{"id": "5b0e275f3e1b2d41806464a25a563b8cc984d495", "text": "Managing Episodic Volunteers in Free/Libre/Open Source Software Communities\n\nAnn Barcomb, Klaas-Jan Stol, Brian Fitzgerald, and Dirk Riehle\n\nAbstract\u2014We draw on the concept of episodic volunteering (EV) from the general volunteering literature to identify practices for managing EV in free/libre/open source software (FLOSS) communities. Infrequent but ongoing participation is widespread, but the practices that community managers are using to manage EV, and their concerns about EV, have not been previously documented. We conducted a policy Delphi study involving 24 FLOSS community managers from 22 different communities. Our panel identified 16 concerns related to managing EV in FLOSS, which we ranked by prevalence. We also describe 65 practices for managing EV in FLOSS. Almost three-quarters of these practices are used by at least three community managers. We report these practices using a systematic presentation that includes context, relationships between practices, and concerns that they address. These findings provide a coherent framework that can help FLOSS community managers to better manage episodic contributors.\n\nIndex Terms\u2014Best practices, community management, episodic volunteering, free software, open source software\n\n1 INTRODUCTION\n\nFree/Libre/Open Source Software (FLOSS) research has traditionally divided contributors into core and periphery, where core describes the minority of top developers who contribute 80 percent of the code and the periphery describes all other developers [1], [2], [3]. This focus on the volume of contributions assumes a homogenized periphery, without any further distinction within that group. Further, by its very definition this distinction has an exclusive focus on code contributions, ignoring the many other types of contributions that are made to FLOSS projects. To better understand the periphery of FLOSS communities, several researchers have begun to differentiate participants within the periphery, based on the frequency and duration of their participation [4], [5], [6], [7]. In earlier work, we have drawn upon the concept of episodic volunteering (EV) from the volunteering literature to describe the subset of peripheral contributors whose contributions are short-term or infrequent [8], [9], in contrast to habitual contributors, whose contributions are \u201ccontinuous or successive\u201d [10]. In so doing, we have also reconsidered the definition of contribution, expanding it from software (or code) contribution to any type of activity within a FLOSS community [6]. By using this alternative lens on FLOSS communities, we found evidence for a wide range of contributions that episodic volunteers have made [6]. Based on a qualitative survey of 13 FLOSS communities, we developed a detailed understanding from the perspectives of both episodic volunteers and community managers. Based on this, we established an initial set of recommendations to engage episodic volunteers. A key concern in the context of episodic volunteering is whether these volunteers return to make further contributions. Drawing on the general volunteering literature, we evaluated a theoretical model that helps explain retention of episodic volunteers.\n\nIn this article we extend this line of research on EV in FLOSS communities. Episodic contributors represent a class of participants that can make a wide range of valuable contributions to FLOSS projects [6]. By their very nature, their participating behavior is incidental and not continuous, and so it is of particular interest to understand how episodic contributors can be \u201cretained,\u201d which in this context refers to them returning to a project to contribute again, rather than converting them into habitual contributors. Retention is appealing because returning contributors require less assistance than newcomers [11] and retention is one of the key factors in FLOSS project sustainability [12], [13], [14], [15], [16]. However, evidence from the general volunteering literature suggests that many organizations do not have clear strategies in place to effectively manage episodic contributors [11], [17]. Organizations may also face internal resistance in implementing these changes, as episodic contributors may be negatively perceived as costing more in resources than they deliver in contributions [18].\n\nDespite these challenges, EV is an increasingly important topic in volunteer management due to the increase in and preference for this kind of work [8], [19], [20], [21], [22]. Adapting to the changing volunteering context is necessary for the sustainability of non-profit organizations [22].\nFLOSS it has long been observed that many contributors are episodic, for instance in the case of bug reporting [2], [6], [23], [24], [25]. Furthermore, a number of benefits have been attributed to peripheral contributors\u2014increased identification of legal issues such as copyright infringement, and high-quality bug fixes, for example [14], [26]. Hence, given the increased recognition of the importance of episodic volunteers and their contributions, it is imperative to study how to manage episodic volunteers in FLOSS communities.\n\nA major change in FLOSS communities over the last decade has been the increase in firms\u2019 involvement in open source development although volunteers remain important participants [27], [28], [29]. Many companies in different sectors use software which is developed by external FLOSS projects [30], and consequently many firms now employ developers to contribute to specific open source projects that they identify as critical to their business. Paid development does not negate the need to understand episodic participation. Even in company-dominated FLOSS communities, external developers still contribute a significant proportion of commits [31]. Additionally, from the perspective of the community, paid developers employed by external firms cannot be directed as employees [32], [33]. Although there are differences between paid contributors and other participants [28], paid contributors\u2019 participation is sometimes episodic from the perspective of the community. Our research considers episodic participation from the community perspective, and consequently we adopt the broadest definition of volunteering, to encompass anyone engaging in FLOSS contributions who is not directly sponsored by the FLOSS community [6]. This broad definition allows us to identify practices which can actually be used by communities, without any concern for whether or not contributors are paid or sponsored by a firm. When paid contributors affect community managers\u2019 concerns and practices, this is explicitly noted in our findings.\n\nFLOSS research has been challenged for its reliance on studying forms of participation which can be readily observed through data mining, notably code contributions, bug reports, and mailing lists [34], [35]. Exclusion of non-code contributors limits the applicability of research on larger FLOSS communities, which depend not only on code contributions but also a wide range of other activities, such as planning, advocacy, mentoring, and event organization [35], [36], [37], [38]. Both unpaid and paid contributors can participate in a range of activities within FLOSS communities [39].\n\nDespite extensive research on community practices, e.g., [3], only two studies have focused specifically on episodic participation, and neither focused on identifying an extensive list of practices [6], [40]. The fact that specific practices have been proposed for other peripheral sub-groups, namely newcomers [41], [42], suggests that FLOSS communities may be using different practices, or adapting existing practices to different ends, in order to manage episodic contributors. Hence, our study had the following objectives:\n\n1) Identify the concerns community managers have about episodic volunteers.\n2) Identify the practices that community managers are using, or envisage using, to address their concerns about episodic volunteers.\n\nTo address these objectives, we conducted a Delphi study, which is a structured communication technique involving a panel of experts. We drew on the experience of FLOSS community managers to identify the concerns community managers have with EV, the practices they use\u2014or consider using\u2014to manage EV, and preliminary suggestions for how practices could be combined. This article makes the following contributions toward understanding the management of EV in FLOSS:\n\n- A prioritized list of 16 EV community manager concerns;\n- An extensive collection of practices which might be used to manage EV (74 percent are being used by at least three community managers), which includes connections to the concerns previously identified, as well as relationships between practices;\n- Workflows proposed by community managers which demonstrate how practices can be combined.\n\nThe remainder of the article is organized as follows. Section 2 reviews previous work that investigated open source communities, volunteers, and in particular the role of episodic contributors. Section 3 presents the Delphi research approach that we adopted, including a discussion of participant selection, data collection, and data analysis procedures. Section 4 presents the findings of the study by presenting a set of practices and concerns. Section 5 concludes by discussing our findings, the limitations of the study, and an outlook to future work.\n\n2 RELATED WORK\n\nThis section reviews prior work on peripheral contributors and episodic volunteering in FLOSS communities.\n\n2.1 Peripheral Contributors in FLOSS Communities\n\nOne of the earliest conceptions of the structure in FLOSS communities is the so-called Onion model [1], [43]. The Onion model depicts increasing numbers and decreasing engagement moving from the innermost core to the outermost passive users. The core contains the most prolific developers, often described as the people who create 80 percent of the code [2]. Beyond the core is the periphery, who contribute fewer lines of code.\n\nAlthough much of the earlier research focused on the core (e.g., [2], [24]), there is now significant understanding of both the importance of the periphery and the motivations of peripheral participants. Peripheral contributors provide a range of benefits:\n\n- Bringing new knowledge to the project [26], [44], [45], [46];\n- Raising awareness of the project [46], [47], [48];\n- Providing new potential core contributors [26], [45], [49], [50], [51];\n- Proposing new features [44], [52];\n- Contributing new code [26], [44], [45], [53];\n- Finding and reporting bugs [54];\n- Ensuring members\u2019 behavior abides by community norms [26].\n\nFLOSS developer motivations have been extensively studied. Motives are usually characterized as intrinsic motives,\ninherent to the job, such as altruism and enjoyment, *internalized extrinsic* motives such as reputation and reciprocity, and *extrinsic* motives such as career and salary [55]. Peripheral contributors tend to have the same set of motivations as core developers [37], but those with extrinsic motives are less likely to continue to participate [45], [56]. In particular, peripheral contributors are more likely to seek out opportunities which afford them greater recognition with stakeholders and the chance to gain reputation [45]. Extrinsic motives, such as the desire to build a reputation and gain recognition, are more widespread among peripheral developers than core developers [45].\n\nRecent work has begun to study the periphery more closely to identify and distinguish different types of contributors. One dimension often used to distinguish is the frequency of participation. Groups that are distinguished by the frequency of their participation are newcomers [41], [57], [58], [59], [59], [60], [61], [62], people who attempt to become contributors [63], and one-time contributors [5], [40], [56]. In earlier work, we have linked the general episodic volunteering literature to the periphery [6]. The disaggregation of the periphery by frequency of contribution could also be viewed as an extension to, rather than a departure from the Onion model. The outer layers\u2014active users and passive users\u2014are already defined by their own actions irrespective of the contributions of others. Active users engage with the project, for instance by supplying bug reports, while passive users only use the software. Disentangling the homogenized periphery into sub-categories distinguished by frequency of participation refines the Onion model and allows for the identification of distinct attributes of different groups within the periphery.\n\nIn the Onion model, the different layers describe how people contribute to the software, whereas FLOSS projects include many other ways to get involved [35], [36], [37]. Carillo and Bernard [64] described code-centricity as a limitation:\n\n> \u201cBy stereotyping FOSS projects as communities of developers loosely collaborating on a FOSS-licensed software project via an online project platform, we disregard the massive amount of information that is not captured on platforms and also neglect the myriad of non-code related tasks and roles without which a project could not be what it is.\u201d\n\nEmphasis on code contributions within FLOSS communities may not only devalue other types of contributions, but may specifically disadvantage women [65]. Other studies have found that women\u2019s participation in FLOSS remains low in both code and non-code activities, including leadership [66], [67], [68]. Nafus\u2019s [65] participant observation study of FLOSS contributors found that \u201cmen monopolize code authorship and simultaneously de-legitimize the kinds of social ties necessary to build mechanisms for women\u2019s inclusion.\u201d Research has also demonstrated that some barriers to entry for newcomers are gendered [60], [69], and that gender may influence retention among episodic contributors [7]. Because code contributors do not represent the entire community in terms of the diversity of work, and may additionally be demographically unrepresentative, we argue the importance of including non-code contributions in our study. This emphasis makes the EV concept, which originates in the general volunteering literature rather than the software engineering literature, an appropriate lens for the study because it places no particular emphasis on any one type of contribution.\n\n### 2.2 Episodic Volunteering\n\nEpisodic volunteering is a term from the general volunteering literature describing short-term or infrequent participation. Although a particular engagement may be of limited duration, retention of episodic contributors is possible. In the context of EV, retention does not mean conversion to habitual participation but repeated engagement with the same organization. In a systematic review of the EV literature, Hyde et al. [70] identified retention as a key topic in need of further research. Retention remains a compelling subject because returning volunteers require less training [11] and retention is one measure of stability in FLOSS [13], [14], [15]. The general volunteering literature on the retention of episodic contributors has largely focused on explaining the factors that lead to retention, such as satisfaction with the previous volunteering experience, intention to return, and availability [10], [71], [72]. In the FLOSS domain, Steinmacher et al. [73] found that higher quality email responses encouraged retention among newcomers. Meanwhile, Labuschagne and Holmes [57] critically examined Mozilla\u2019s onboarding programs and found that it may not result in long-term contributors, despite the fact that mentored newcomers consider the program valuable. A study evaluating five potential EV retention factors found that satisfaction, community commitment, and social norms correlate with intention to remain [7].\n\nAnother important problem in general volunteering is how organizations incorporate EV [17]. Although EV is sometimes viewed as disruptive, it is widespread and a reality that requires organizations to reconsider their strategies [18], [19], [45], [74]. Volunteer agencies can adjust to the expectations of episodic contributors by offering more flexibility in commitment, reducing training requirements, increasing the social element of service, and recognizing volunteers [75]. Volunteer coordinators can also identify tasks that are suitable for episodic contributors, which may include one-off contributions at events and on-going but non-specialized work [11]. Evaluation of suitable tasks can be done systematically by applying a \u2018volunteer scenario\u2019 approach that categorizes volunteer assets, volunteer availability, and potential assignments [76].\n\nWhile there is no single work that has collected a comprehensive set of practices for managing EV in FLOSS, previous studies have proposed practices for managing FLOSS contributors. Previously, we identified 20 potential practices for EV management by evaluating existing FLOSS practices in light of factors associated with the retention of episodic contributors and prior general volunteering recommendations [6]. Meanwhile, Steinmacher et al. [41] identified nine practices for communities onboarding new contributors and corresponding recommendations for new contributors. We consider practices for newcomers relevant to the study of EV because community managers cannot distinguish the future episodic volunteer from the future habitual volunteer [72] when they make their first contribution.\nThis study updates this line of work by drawing on the expertise of community managers. At the time of our first study [6], we found very limited evidence of community managers managing EV. This approach increases the scope and number of practices identified. First, we examine both practices which are already being used to manage EV as well as practices that experts think might be appropriate, and distinguish between speculation and observed practice. Second, we look at most of the volunteer process, from onboarding to retention, excluding only recruitment.\n\n3 Study Design\n\nIn this section we outline the Delphi research method, elaborate the participant selection, and data collection and analysis methods.\n\n3.1 Research Method\n\nOur research is concerned with understanding current practices for managing episodic contributors, and also proposes practices that may be helpful for managing EV. The Delphi method was developed as a way of finding the collected opinions of a group of experts and works on the assumption that multiple experts are better able to arrive at more accurate solutions to problems. Anonymity between participants is used to prevent participants with high status or reputation from having a disproportionate influence [77], [78], [79]. The Delphi approach is suitable for complex problems [80], when solutions do not yet exist and may be best explored through the subjective judgments of an informed group of experts [77], [81].\n\nWhile not common in software engineering research, the Delphi method has previously been used to study complex topics such as tailoring of agile methods [82] and the adoption of tools by FLOSS developers [83]. Delphi studies typically comprise several rounds of data collection\u2014as participants are exposed to new information in every round, they may develop new insights through iteration and exposure to others\u2019 ideas. The Delphi method can also be conducted asynchronously, which was of particular importance in our context given the geographic distribution of open source experts.\n\nThe traditional Delphi method focuses on achieving consensus. As it has evolved, a variant known as the policy Delphi has emerged. A policy Delphi study is appropriate when the purpose of the study is not to establish consensus but to identify the main arguments and positions [77]. We decided that a policy Delphi study rather than a traditional Delphi study would be more appropriate in our context, because we recognized that communities may have different goals when managing EV which could be driven by community size, cultural context, or types of contribution being considered. We wanted to articulate these constraints in order to provide context for the practices, rather than assume that one approach would be effective for all communities and activities within communities. However, we were also interested in generalizing common practices and concerns, and used the collation of the different rounds of data collection to achieve consensus of opinions.\n\nWe codify the results of our research in the form of a collection of practices, in the appendix [84], which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TSE.2020.2985093. This ensures that the fruits of our research work can be used by practitioners, a key goal of our research.\n\nEV management includes all phases of the volunteer management process. We explicitly excluded recruitment practices from consideration in our study because many of those are not specific to episodic volunteering. This focus was necessary to limit the scope of the study, which otherwise could overwhelm the participants and diffuse their focus. Although onboarding is another area where we expect overlap between habitual and episodic management, we decided to retain this part of the process in order to compare our results to a recent study summarizing onboarding practices for newcomers [41].\n\n3.2 Participant Selection\n\nParticipant selection is a key aspect of a successful Delphi study [85]. Participants must be selected with care, and not chosen simply on the basis of availability [86].\n\nWe sought to select a panel of 20 to 25 participants, to ensure sufficient diversity even if some participants would stop participating in the study. This is within the recommended range of 15\u201330 participants [87]. Potential participants were identified in one of three ways. First, some approached us directly following presentations at practitioner conferences. Second, we identified people among our contacts, and people who were recommended to us by contacts. From these two groups we approached a subset which met our selection requirements, which we describe below. Third, we evaluated gaps in our coverage and sent cold emails to people we identified through online searches. The selection of participants was based not only on their enthusiasm for participation or connection to us, but also on the degree of diversity along the three selection dimensions (discussed below), as well as our expectation that the participants would be able to provide relevant input. Additionally, although gender has not, to our knowledge, been directly linked to community management, our awareness that gender can affect FLOSS participation experiences [60] inspired us to deliberately recruit female participants. In total, one-third of our participants were female. Table 1 summarizes the participants by community and their participation in the different rounds of our study.\n\nTo gain the full benefit of multiple perspectives, participants of a Delphi study should be diverse rather than homogeneous [88]. We identified three dimensions relevant to our study along which we expected differences of opinion to arise: size of community, contribution type, and country. We discuss each in detail below.\n\n3.2.1 Size of Community\n\nA previous study investigating the current state of EV in FLOSS discovered that the tasks considered appropriate for episodic contributors vary by community size [6]. For example, in smaller communities, translation is an ad-hoc task well-suited to EV. Larger communities have more complicated rules when translating, and a full cognizance of those rules requires more habitual participation. Organization size is also a factor commonly considered in studies identifying\nbest practices. For example, in their case study of best practices for volunteer organizations, Carvalho and Sampaio [89] considered the size of volunteer organizations in terms of the numbers of beneficiaries, paid employees, and volunteers. Because there are many different ways to operationalize community size\u2014number of users, number of developers, size of core\u2014and because size is more continuous than categorical, we did not categorize communities by size, but instead sought to include a number of communities of different size.\n\nAll communities represented by our panel experts have more than a handful of contributors. This is justified because extremely small communities tend not to be concerned with developing a volunteer management process or workflow. The communities represented are shown in Table 1. In total, 22 communities were represented, and four of these communities (Debian, Ubuntu, KDE, OpenStack) were represented twice. Detailed descriptions of each community are provided in the appendix [84], available in the online supplemental material.\n\n### 3.2.2 Contributor Activities\n\nMuch of FLOSS research has been code-centric, but in large communities people work in a number of activities, such as translation and maintaining web services [35]. Our earlier study on EV in FLOSS found that while episodic contributors can engage in all activities, some areas are considered more suitable than others, depending on the community [6]. We expect that the perspective of community managers might be influenced by the activities they engage in. We used the classification system introduced by Rozas [38] to describe the Drupal community, because it contains the most comprehensive categorization of FLOSS activities.\n\n### 3.2.3 Country\n\nFLOSS communities are international, although North American and European countries are disproportionately over-represented [90]. Geographic boundaries can be eliminated, but cultural barriers may remain. For example, in 2002, Nakakoji et al. [1] explained that Japanese programmers were reluctant to directly communicate with GNU GCC core developers because they saw them as superior programmers and wanted to keep a \u201crespectful distance.\u201d One difficulty with identifying cultural diversity is increasing globalization, which has led to intercultural identities and identification with not only country of birth, but also residence [91], [92]. We therefore considered both the country of origin as well as of residence.\n\nOur participants represented 23 countries, spanning all populated continents: Argentina, Australia, Brazil, Cyprus, Czech Republic, France, Germany, Hungary, India, Ireland, Italy, Japan, Kenya, Peru, Romania, Singapore, Spain, South Korea, Tunisia, Uganda, Ukraine, the United Kingdom, and the United States. The appendix provides details about participants\u2019 countries of residence and origin [84], available in the online supplemental material.\n\n### 3.3 Data Collection and Analysis\n\nData collection was initiated in January 2018 and concluded in October 2018. The study comprised three rounds, as shown in Fig. 1.\n\nIn the first round, participants were asked to think of any concerns they had about EV, and how they might address them. All participants were engaged in community management, which was a precondition for participating in the study. Our participants had experience with close to six categories on average, and all were involved in multiple types of contributions. Table 2 shows a paraphrased list of contribution types along with a count of how many participants were engaged in each activity. The appendix provides a detailed list of each participant\u2019s contribution types [84], available in the online supplemental material.\n\n### Table 1: Study Participants by Community and Study Participation\n\n| ID | Community | Rounds participated |\n|-----|--------------------|---------------------|\n| CM1 | (Anonymous) | \u2713 |\n| CM2 | Apache, RDO | \u2713 |\n| CM3 | ChakraLinux | \u2713 |\n| CM4 | CHAOSS | \u2713 |\n| CM5 | Debian | \u2713 |\n| CM6 | Drupal | \u2713 |\n| CM7 | Fedora | \u2713 |\n| CM8 | Fedora | \u2713 |\n| CM9 | Joomla! | \u2713 |\n| CM10| KDE, NextCloud | \u2713 |\n| CM11| KDE, Kubuntu | \u2713 |\n| CM12| Linux Mint, Debian| \u2713 |\n| CM13| Mozilla | \u2713 |\n| CM14| Mozilla | \u2713 |\n| CM15| OpenChain | \u2713 |\n| CM16| OpenStack, Debian | \u2713 |\n| CM17| OpenStack | \u2713 |\n| CM18| OSGeo-Live | \u2713 |\n| CM19| Perl | \u2713 |\n| CM20| PostgreSQL | \u2713 |\n| CM21| Python | \u2713 |\n| CM22| Ubuntu | \u2713 |\n| CM23| Ubuntu | \u2713 |\n| CM24| Women who Code | \u2713 |\n\n### Table 2: Number of Participants Engaged by Contribution Type Based on [38]\n\n| Name | Description | No. |\n|-----------------------|--------------------------------------------------|-----|\n| Source code | Write code, review code, report bugs | 14 |\n| Documentation | Write, report issues | 14 |\n| Translation | Translate and review translation | 9 |\n| Design | User experience design, visual design, style guide creation | 6 |\n| Support | Participate in support fora, create cookbooks | 11 |\n| Evangelizing | Blog posts, speaking at unrelated events, marketing | 19 |\n| Mentoring | Creation of training materials, mentoring contributors | 15 |\n| Community management | Participation in working and local groups, conflict resolution, governance | 24 |\n| Events | Organization of events, speaking at events | 18 |\n| Economic | Make donations and seek sponsors | 12 |\nthose concerns. The purpose of this round was to generate a broad overview of the concerns and problems affecting communities.\n\nCollating this round involved identifying all the unique concerns by name and description, and creating a list of all the unique practices by name, description, and associated concerns.\n\nIn the second round, we sought to refine our understanding of both concerns and practices. For the concerns, this entailed collecting information on the prevalence and ranking of concerns, while for the practices we elicited relationships between practices, specifically the preceding/subsequent and complementary relationships, and possible workflows. The collation for this round focused on more elaborate descriptions of practices, and reported on the ranking of concerns. Workflows were also shown.\n\nThe third round involved refining the information we had gathered on practices. Participants were asked to verify if they had used or only proposed a practice, and were asked to specify any relationships, context, or limitations which our earlier analyses had missed. The collation consisted of the most extended description of practices.\n\nIn each round, questions were posted and participants were given several weeks to respond. At the end of the period, reminders were sent to participants who had not yet responded, and the response time was extended.\n\nAfter all responses were received, they were analyzed by the lead author using the QDAcity tool for qualitative data analysis. Contextual codes representing the dimensions of interest (community name, participant\u2019s contribution types, and participant\u2019s country) were applied first. Next, the lead author performed theoretical thematic analysis based on the theme of each round [93]. From Round II, the collation was presented to all authors and participants as a collection of practices, also known as a handbook [94]. The collation was sent to participants after each round as a form of member checking [95]. Additionally, after Round III, participants were supplied with a list of practices attributed to them, giving them the opportunity to challenge our interpretation. Participants were given one week to suggest modifications to the collation, then sent the revised document. In the first two rounds we received minor requests for changes, while in the final round we received only acknowledgements of receipt.\n\nResponses to each round were anonymized and then sent to the respondents to confirm that the modifications did not obscure the message. Analysis was conducted on the original responses, but the anonymized responses were used to provide quotations for the collations. Quotations were attributed to individual study participants by means of an assigned two-letter code. Each participant was able to identify their own contributions, and could also build up an impression of other study participants as individuals, without knowing their identities.\n\n4 Results\n\nThis section presents the results of our study. Section 4.1 discusses concerns associated with managing episodic contributors. Section 4.2 focuses on the practices for managing episodic contributors, and Section 4.3 extends relationships between practices into workflows.\n\n4.1 Concerns With Episodic Volunteering\n\nWe identified a set of concerns that community managers have about EV. Broadly, community managers have a number of concerns about knowledge transmission between the community and episodic participants, the suitability of episodic contributors for tasks, how effectively community processes support EV, and how episodic contributors are included in the community. We identified sixteen concerns that community managers identified regarding episodic volunteering in their communities. Table 3 specifies all sixteen concerns by category, how frequently they were observed, and how many participants ranked these concerns in their top three most pressing concerns.\n\nSpace limitations preclude us from discussing all concerns. We illustrate the most common concerns in more detail below. The complete set of concerns is described in the appendix [84], available in the online supplemental material.\n\nConcern 2.C Episodic contributor lacks awareness of opportunities to contribute was deemed most important, observed by 20 community managers and ranked as the most pressing concern by eight study participants. One community manager expressed this urgency as follows:\n\n\u201cKeeping volunteers interested by openly sharing opportunities where they can contribute (technical or non-technical) should be given priority.\u201d \u2014CM\n\nConcern: 2.C Episodic contributor lacks awareness of opportunities to contribute\n\nCommunicating opportunities to get involved in a way that reaches episodic contributors is a concern for communities, especially when the people who are aware of tasks which could be done episodically do not enjoy outreach activities.\nTABLE 3\nConcerns by Category, Number of Community Managers Observing Concern, Number of Times Ranked as Most Important Concern, Second Most Important Concern, and Third Most Important Concern\n\n| Concern | Obs. No. | No. #1 | No. #2 | No. #3 |\n|------------------------------------------------------------------------|----------|--------|--------|--------|\n| Knowledge exchange | | | | |\n| 1.C Episodic contributor lacks knowledge of developments during absences| 10 | 1 | 1 | 1 |\n| 2.C Episodic contributor lacks awareness of opportunities to contribute | 20 | 8 | 1 | 4 |\n| 3.C Community lacks knowledge of availability of episodic contributors | 15 | 2 | 1 | 2 |\n| 4.C Episodic contributor lacks understanding of project vision | 11 | 1 | 2 | 1 |\n| 5.C Episodic contributor and community have mismatched expectations | 13 | 1 | 1 | 1 |\n| Suitability of episodic contributors for the work | | | | |\n| 6.C Episodic contributor quality of work is insufficient | 9 | 2 | 0 | 0 |\n| 7.C Episodic contributor\u2019s timeliness and completion of work is poor | 14 | 1 | 1 | 1 |\n| 8.C Community\u2019s cost of supervision exceeds benefit of episodic contribution | 8 | 1 | 1 | 1 |\n| Community processes do not support EV | | | | |\n| 9.C Community cannot retain episodic contributors for sporadic requirements | 8 | 0 | 1 | 2 |\n| 10.C Community has difficulty identifying appropriate tasks for episodic contributors | 15 | 1 | 4 | 2 |\n| 11.C Community lacks an episodic strategy | 14 | 2 | 6 | 1 |\n| 12.C Community insufficiently supports episodic contributors | 4 | 0 | 0 | 0 |\n| Marginalization of episodic contributors | | | | |\n| 13.C Community restricts episodic contributors from leadership roles | 12 | 1 | 1 | 1 |\n| 14.C Community excludes episodic contributors from discussions and decisions | 10 | 2 | 0 | 3 |\n| 15.C Community gives episodic contributors reduced access to opportunities and rewards | 5 | 0 | 0 | 0 |\n| 16.C Community lacks appreciation for and recognition of episodic contributors | 9 | 0 | 1 | 1 |\n\nA key characteristic of episodic volunteers is that they contribute irregularly and the nature of their participation tends to be of short duration. This lack of day-to-day engagement with a project means that episodic volunteers may simply not be aware of the opportunities to contribute.\n\nFifteen community managers observed 3.C Community lacks knowledge of availability of episodic contributors, and two considered it their primary concern. One community manager described the issue for in-person events such as conferences:\n\n\u201cThis [lack of knowledge] is a big problem when working with online communities, but it can grow exponentially when you are working a live event. You may do a call for volunteers, and you may end up short-handed, and doing three things at once.\u201d \u2014CM23\n\nThis concern directly links to one of the defining characteristics that sets episodic volunteers apart from habitual volunteers. The scenario outlined in the quote above clearly identifies a key issue with episodic volunteers, namely that their availability tends to be much more restricted. In fact, between episodes of activity, these volunteers may be quite removed from what is happening in a community on a day-to-day basis.\n\nConcern 7.C Episodic contributor\u2019s timeliness and completion of work is poor was mentioned by 14 community managers, with one ranking it as the biggest concern. CM24 summarized the concern:\n\n\u201cThe main problem of using this kind of help is that sometimes you don\u2019t know whether a person that has started a task is able to finish it all or finish it with a decent quality.\u201d \u2014CM24\n\nConcern: 7.C Episodic contributor\u2019s timeliness and completion of work is poor\n\nEpisodic contributors may have less investment in ensuring that their work is completed in a timely manner, or is completed at all. This can be especially problematic if the work is important and others are relying on it. In a situation such as an event, it may be unavoidable to put responsibility on episodic participants.\n\nThis concern alludes to the asymmetry of information possessed by community managers and episodic contributors concerning the contributors\u2019 intentions. While contributors are generally aware of their progress and the extent of their dedication to the task, this information is often not conveyed to community managers. For community managers, it becomes difficult to rely on work being completed, or completed to a sufficient standard. With an episodic contributor the problem can be more pronounced, because the community manager may be unable to form an expectation on the quality of future work based on previous experience with the contributor\u2019s work.\n\nCM6 explained why 10.C Community has difficulty identifying appropriate tasks for episodic contributors is a concern. Fifteen community managers had experience with this issue, and one thought it was the most important concern.\n\n\u201cYou need to know the context and background for each task to be effective and not get lost. The problem is that to prepare this information usually requires more time than doing the task itself, so normally the person with the knowledge is the one that will do it. It ends up with few people doing a lot of work and possible contributors without knowledge of how to help.\u201d \u2014CM6\n\nConcern: 10.C Community has difficulty identifying appropriate tasks for episodic contributors\n\nCommunity managers find it difficult to identify and maintain a list of suitable tasks. It can be time-consuming to describe tasks so that they can be picked up by episodic contributors.\n\nIt is recommended that episodic contributors be given stand-alone tasks, which can be accomplished without a deep\n| Conf. Code | Name | Description |\n|-----------|-------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|\n| **Community Governance** | |\n| \u2713 G.1 | Manage the delivery triangle | Adjust scope (quality or features) or schedule when project releases cannot be completed on schedule at the desired level of quality with the expected features. |\n| \u2713 G.2 | Use longer delivery cycles | Make release cycles longer in order to give episodic contributors the opportunity to contribute without intense time pressure. People who have multiple responsibilities will be able to participate in the project. |\n| \u2713 G.3 | Host in-person meetings | Host in-person meetings for creative or organizational work involving multiple volunteers. The frequency of meetings may vary by project: it could be yearly, quarterly, monthly, or even more frequent. |\n| \u2713 G.4 | Make decisions in public | Ensure that decisions are made in a process which is both public and open to suggestions from contributors. Even if the decision is ultimately made by an authoritative body, the transparency of the process can make participants feel a part of it. |\n| \u2713 G.5 | Create a community definition of quality | Create a community definition of quality so that episodic contributors will know what quality is expected. |\n| \u2713 G.6 | Craft a community vision | Craft an inclusive community vision and a code of conduct. A clear vision statement helps people determine if they want to participate in the community. |\n| \u2713 G.7 | Define measuring and success | Define what successful engagement of episodic contributors looks like. Describe how you will measure the impact. |\n| G.8 | Centralize budgeting of sponsorships | Centralize the processing of sponsorships and reimbursements so that all claims will be processed in the same manner, and processing will be timely. |\n| G.9 | Use an external provider for sponsorships | Hire an external service provider to serve as an intermediary in providing sponsorships. |\n| G.10 | Make your leadership diverse | Try to have a diverse board or coordination group to review processes and ensure that they are welcoming and accessible. |\n| G.11 | Seek sponsorship | Look for a stable sponsor to ensure continuity of events. |\n| **Community Preparation** | |\n| \u2713 P.1 | Identify appropriate tasks | Episodic participants can more easily join if tasks are available. Identify the types of tasks which are suited for episodic contributors. |\n| \u2713 P.2 | Define one-off tasks | Create stand-alone, one-off tasks. |\n| \u2713 P.3 | Crowdsource identifying appropriate tasks | Engage experienced contributors in a short-term initiative to identify outstanding issues which could be handled by episodic contributors. Encourage them to continue to identify new tasks, once the backlog has been addressed. |\n| \u2713 P.4 | Document general working practices | Document the community\u2019s working practices, placing particular emphasis on those areas which are most likely to be relevant to new and episodic contributors, and where contributions will be most appreciated. |\n| \u2713 P.5 | Detail how to complete a task | Do not just summarize tasks, but detail the steps that need to be taken, and consider providing a time estimate for the task. |\n| \u2713 P.6 | List current areas of activity | Prioritize tasks and tag them as entry level where appropriate. Group similar tasks together. |\n| \u2713 P.7 | Hold open progress meetings | Hold regular open meetings where previous work is summarized, and new tasks are assigned. |\n| \u2713 P.8 | Create working groups with a narrow focus | Create specialized working groups that people can identify with. |\n| \u2713 P.9 | Create written records of activity | Maintain a summary, for instance in the form of a newsletter, which describes the key discussions and resolutions which took place during a given period. Alternately, rely on written communications (mailing lists, chats) or provide meeting minutes. |\n| \u2713 P.10 | Keep communication channels active | Ensure that communication channels both online and offline are monitored, and that queries are directed to appropriate people. Make sure that people receive responses. |\n| \u2713 P.11 | Send ambassadors to small events | Send ambassadors to attend smaller events, to enable personal interactions with potential participants. |\n| \u2713 P.12 | Respond to all submissions | Respond to every submission in a timely manner. |\n| \u2713 P.13 | Have a social media team | Recruit people who enjoy social media specifically for the task of communicating with potential and episodic contributors. |\n| \u2713 P.14 | Set expiration dates | Set distinct deadlines for initiatives. |\n| \u2713 P.15 | Create continual points of entry | Create ongoing ways for people to join the project and contribute, rather than providing only specific times or times in the process when people can join. |\n| Conf. Code | Name | Description |\n|-----------|------|-------------|\n| P.16 | Share success stories | Share stories about outstanding or long-serving community members and the challenges they faced and benefits they received. |\n| P.17 | Provide templates for presentations | Create one or more standard slide decks which your contributors can use with or without modification. |\n| P.18 | Write modular software | Ensure that software is modular. |\n| P.19 | Educate sponsoring organizations | Educate sponsoring organizations about participation in open source projects, including topics such as the necessity of maintenance and the open model of production. |\n| P.20 | Offer a consistent development environment | Document the workflow, architecture of the module, and use a container to build your project in order to allow people to easily build a local system. Decide upon one recommended way to set up a development environment and focus on this in the documentation. |\n\n**Onboarding Contributors**\n\n| Conf. Code | Name | Description |\n|-----------|------|-------------|\n| O.1 | Learn about the experience, preferences, and time constraints of participants | Ask new and infrequent contributors about their expectations, availability, preferences and experience. |\n| O.2 | Screen potential contributors | Screen potential contributors to determine if they are a good match for the role. This may include having availability at the appropriate time, or being able to commit to a certain amount of time. |\n| O.3 | Guide people to junior jobs | Guide people to junior jobs when they do not know where to start. |\n| O.4 | Give a choice of tasks | Give participants a choice of the task, from a small number offered to them. |\n| O.5 | Manage task assignments with an application | Use an application, such as a wiki or bug tracking system, to handle the assignment process. |\n| O.6 | Explain the need for maintenance | Educate contributors about what happens to a contribution after it is included in the project. Explain the benefits to the project if they remain available to maintain their contribution. |\n| O.7 | Offer guided introductory events | At events, offer walk-through tutorials on getting started as a contributor, culminating in a hackathon working on a specific beginner problem. |\n\n**Working with contributors**\n\n| Conf. Code | Name | Description |\n|-----------|------|-------------|\n| W.1 | Have a key contributor responsible | For every important project, make sure that one key contributor is responsible for managing it and responding to inquiries. |\n| W.2 | Issue reminders | Send a reminder as the deadline approaches. Be persistent in following up on deliverables. |\n| W.3 | Give permission to quit a task | Give people permission to skip on period or task, without recrimination. |\n| W.4 | Encourage people to quit | Encourage people who no longer wish to fulfill a role or complete tasks to step down. |\n| W.5 | Automate checking the quality of work | Utilize advances in continuous integration/continuous delivery to automate routine evaluation. |\n| W.6 | Set expectations | Set expectations for deliverables and communication, even if these are minimal. |\n| W.7 | Reject contributions of insufficient quality | Decline contributions which are inappropriate or not of sufficient quality. |\n| W.8 | Mentor to quality | Provide mentoring when contributions are rejected due to insufficient quality. This might include access to tools to help people meet quality requirements. Ensure that contributors can always reach out to mentors to get up to speed. |\n| W.9 | Require documentation as part of the submission | Require people to sufficiently document their submissions before they are accepted. |\n| W.10 | Encourage learners to mentor | Engage episodic contributors in leading other episodic contributors. Let them review episodic contributions and mentor episodic contributors. |\n| W.11 | Explain the context of the contribution | Understanding the larger context requires time that not all episodic contributors are able or willing to give. |\n| W.12 | Sever ties | Publicly sever the group\u2019s connection to the individual and explain the reasoning. |\n| W.13 | Automate process assistance | Consider automation to help people work through the early processes, such as a chat bot or step-by-step interactive site. |\n\n**Contributor Retention**\n\n| Conf. Code | Name | Description |\n|-----------|------|-------------|\n| R.1 | Publicize your release schedule | Publish your development and release schedule and notify contributors of upcoming milestones, to allow them to plan their engagement. |\n| R.2 | Encourage social connections | Encourage people to work together in a small group to accomplish a task. This might also include groups within a company, who can use a... |\nTABLE 4\n(Continued)\n\n| Conf. Code | Name | Description |\n|------------|-------------------------------|-----------------------------------------------------------------------------|\n| R.3 | Follow up on contributors | Keep in touch with contributors, even if just by sending an email. |\n| R.4 | Instill a sense of community | Help people to understand the cooperative values that underlie free and open source software. This is best done by leading through example. |\n| \u2713 R.5 | Acknowledge all contributions | Have someone responsible for recognizing returning episodic contributors. This person could thank episodic contributors for returning, or alternately, explicitly welcome new contributors. |\n| \u2713 R.6 | Reward participation | Offer a tangible reward for participation, such as an organizer\u2019s dinner or swag. Alternatively, offer recommendation letters, certificates, or online recommendations. |\n| \u2713 R.7 | Recognize everyone | Make use of systems such as badges to recognize the variety of different contributions people can make. At the conclusion of a cycle, thank and identify contributors. |\n| \u2713 R.8 | Praise publicly | Praise volunteers publicly. |\n| \u2713 R.9 | Provide evaluations and a promotion path | Provide assessment and opportunities to episodic contributors. Examples of assessment are skill exploration and personal evaluation. Examples of opportunities are travel, employment consideration, succession planning, and skill building. |\n| R.10 | Promote episodic contributors | Give sustained episodic participants access to rotating leadership positions which depend on experience rather than continuous contributions. |\n| \u2713 R.11 | Announce milestones and celebrate meeting goals | Announce when milestones have been met, and celebrate success. |\n| \u2713 R.12 | Listen to suggestions | Allow anyone who participates to propose what want to implement, even if the decisions are ultimately made by a steering committee. If concepts don\u2019t fit in with the primary project goals, allow people to create unofficial initiatives, provided these don\u2019t damage the project. Invite creators of unofficial initiatives to incorporate them in the main project if they are successful and of high quality. Alternatively, if the project is stand-alone, recognize these successes within the project. Rotate between different focus areas with a consistent schedule. |\n| \u2713 R.13 | Incorporate unofficial successes | Invite creators of unofficial initiatives to incorporate them in the main project if they are successful and of high quality. Alternatively, if the project is stand-alone, recognize these successes within the project. Rotate between different focus areas with a consistent schedule. |\n| \u2713 R.14 | Rotate focus areas on schedule | Rotate between different focus areas with a consistent schedule. |\n\nIt is only in recent years that many FLOSS communities have sought to create strategies for particular aims, such as retaining newcomers or recognizing non-code contributions. Managing episodic contributors also benefits from a recognition of the problem, identification of the desired outcome, and an evaluation of practices which might be used to achieve the goal. In our previous study, community managers didn\u2019t report making use of any practices for managing EV [6]. This study shows that FLOSS communities are adopting or adapting practices for managing EV. The fact that the concern of how to manage EV effectively remains a high concern demonstrates the need for a study such as ours, which collects and codifies the experience of multiple community managers to create a larger body of knowledge.\n\n4.2 Practices for Managing Episodic Volunteering\n\nWe organized the identified practices into a number of categories based on the \u201clifecycle\u201d of episodic contributors\u2019 engagement. In practice, a community will not address these categories sequentially, but will move between them, iterate through them, or use practices in parallel. However, organizing the practices in categories can help to communicate them to FLOSS community managers. Each practice is aimed at ameliorating one or more of the concerns described in the previous section.\n\nIn total, we identified 65 practices in our study across the five categories. Table 4 provides a complete list of practices, along with a brief description of each practice. Of the 65\npractices, 48 were confirmed (indicated by a checkmark) to be in use by at least three community managers for the specific purpose of managing EV. The remaining 17 practices were proposed by our panel experts for EV management; they were used by zero, one, or two community managers.\n\nTable 4 contains a brief description of each practice. The full description of each practice is more detailed. In the following subsections, we include as exemplars the full descriptions of one confirmed practice from each category, which was not previously described in the literature (see Table 5). The full descriptions of all practices can be found in the appendix [84], available in the online supplemental material.\n\nThe full description of a practice includes the context which may limit the generalizability of the practice, a list of the concerns involved, and a solution. It can optionally include challenges which may arise with implementing the solution, a list of community managers participating in the study who have used the practice, and a list of community managers who suggested but have not used the practice. Additionally, each practice can include a list of related practices. For the most part, practices are not meant to be used in isolation, but to be combined with related practices. Section 4.3 provides examples of how practices can be combined. Relationships between practices can take the following forms, all of which are shown in at least one of the exemplar practices chosen to demonstrate them:\n\n- **General/Specific** describes a relationship where the specific practice is a more restricted and specialized practice, compared to the general practice. It is demonstrated in R.9 Provide evaluations and a promotion path (a general practice) and O.2 Screen potential contributors (a specific practice).\n- **Alternative** describes two or more practices which address the same concerns with largely incompatible solutions. An example of this relationship is shown in P.8 Create working groups with a narrow focus.\n- **Preceding/Succeeding** is a relationship where practices are best applied in sequential order. An example of this relationship is found in G.5 Create a community definition of quality, which shows both preceding and succeeding practices.\n- **Complementary** describes the situation where practices work well when combined with other practices. W.10 Encourage learners to mentor demonstrates this relationship.\n\n### 4.2.1 Community Governance\n\nThe category **Community Governance** contains practices that address broad questions about how the community operates. These are practices that will affect the potential episodic contributor\u2019s first impressions of what kind of community it is. One example of practices in this category is G.5 Create a community definition of quality. CM24 stated they were able to make more extensive use of episodic contributors once the community began \u201cdocumenting our standards of quality.\u201d Another community manager, CM16, explained that new contributors and\nepisodic contributors typically are expected to know what the project considers \u201cquality work,\u201d but that \u201cwe never really explain it in a way that\u2019s easy to learn, so it ends up being a barrier to entry.\u201d\n\nPractice G.5: Create a community definition of quality\n\n**Context:** Episodic contributors do not necessarily know what level of quality is expected. The community is large and mature enough that lack of a common perspective causes problems, and contributors cannot be expected to tacitly acquire the knowledge.\n\n**Concerns:**\n- 4.C Episodic contributor lacks understanding of project vision\n- 6.C Episodic contributor quality of work is insufficient\n- 7.C Episodic contributor\u2019s timeliness and completion of work is poor\n- 11.C Community lacks an episodic strategy\n\n**Solution:** Create a community definition of quality so that episodic contributors will know what quality is expected. It will become significantly easier to follow many of the subsequent practices if quality is defined within the community.\n\n**Related practices:**\n- P.4 Document general working practices is a COMPLEMENTARY practice.\n- G.6 Craft a community vision is a possible PRECEDING step.\n- P.10 Keep communication channels active is a possible PRECEDING step.\n- P.13 Have a social media team is a possible PRECEDING step.\n- G.7 Define measuring and success is a possible SUCCEEDING step.\n- P.5 Detail how to complete a task is a possible SUCCEEDING step.\n- P.6 List current areas of activity is a possible SUCCEEDING step.\n- W.5 Automate checking the quality of work is a possible SUCCEEDING step.\n- W.6 Set expectations is a possible SUCCEEDING step.\n- W.7 Reject contributions of insufficient quality is a possible SUCCEEDING step.\n- W.8 Mentor to quality is a possible SUCCEEDING step.\n\n**Challenges:** It can be difficult to retroactively apply a definition of quality to an existing project, when not all participants are in agreement.\n\n**Used by:** CM\\textsubscript{15}, CM\\textsubscript{13}, CM\\textsubscript{14}, CM\\textsubscript{18}, CM\\textsubscript{24}\n\n**Proposed by:** CM\\textsubscript{16}, CM\\textsubscript{19}\n\n### 4.2.2 Community Preparation\n\nThe category **Community Preparation** contains practices associated with preparing the community to engage episodic contributors. Identifying appropriate tasks and lowering barriers to entry are part of this group. CM\\textsubscript{4} explained the reasoning behind practice P.8 Create working groups with a narrow focus to prepare the community for accepting episodic contributors:\n\n\u201cBy focusing the working group on a topic that people can identify with, we hope that episodic contributors have an easier time identifying what is useful to them and then have a place to contribute.\u201d \u2014CM\\textsubscript{4}\n\n#### 4.2.3 Onboarding Contributors\n\nThe category **Onboarding Contributors** contains practices that can be applied when a new episodic contributor joins the community. O.2 Screen potential contributors is part of the collection of practices for incorporating episodic contributors. A community manager explained why screening can be beneficial:\n\nPractice P.8: Create working groups with a narrow focus\n\n**Context:** The project is too complex for participants to easily comprehend it in its entirety. It is not possible to readily identify stand-alone tasks in the project.\n\n**Concerns:**\n- 2.C Episodic contributor lacks awareness of opportunities to contribute\n\n**Solution:** Create specialized working groups that people can identify with. With a narrow focus and defined outcomes, episodic contributors will be able to find tasks more readily.\n\n**Related practices:**\n- P.6 List current areas of activity is a possible ALTERNATIVE step.\n- P.18 Write modular software is a possible ALTERNATIVE step.\n- P.18 Write modular software is a COMPLEMENTARY practice.\n- P.18 Write modular software is a possible PRECEDING step.\n- O.1 Learn about the experience, preferences, and time constraints of participants is a possible PRECEDING step.\n\n**Challenges:** Contributions within the working groups will need to be reported back to the larger group.\n\n**Used by:** CM\\textsubscript{2}, CM\\textsubscript{3}, CM\\textsubscript{4}, CM\\textsubscript{5}, CM\\textsubscript{6}, CM\\textsubscript{16}\n\n\u201cThe first criteria of contribution should be the availability/commitment of participants to donate their time (specifically mentioned as a time frame). This will help reviewers and community leaders to estimate the impact of the contributions.\u201d \u2014CM\\textsubscript{14}\n\n### 4.2.4 Working With Contributors\n\nThe category **Working with contributors** contains practices applied during the period that the episodic contributor is working on an assignment. These practices ensure that episodic contributors\u2019 contributions can be used by the community. A study participant expressed an interest in applying the practice W.10 Encourage learners to mentor when working with contributors:\n\n\u201cIt should be possible for the people reviewing episodic contributions to be a different group than the most active developers, so reviews of episodic contributions don\u2019t eat away the time available for other larger contributions. I\nalmost think of this like a mentorship, and the pool of reviewers might even be episodic contributors themselves, who have learned enough to spend part of their limited time on the project reviewing episodic contributions by others.\u201d \u2014CM16\n\nPractice O.2: Screen potential contributors\n\n**Context:** In order for a contributor to properly perform a role, a certain minimum commitment is required. The project has repeated problems with people insufficiently committing to roles.\n\n**Concerns:**\n- 3.C Community lacks knowledge of availability of episodic contributors\n- 4.C Episodic contributor lacks understanding of project vision\n- 5.C Episodic contributor and community have mismatched expectations\n- 10.C Community has difficulty identifying appropriate tasks for episodic contributors\n\n**Solution:** Screen potential contributors to determine if they are a good match for the role. This may include having availability at the appropriate time, or being able to commit to a certain amount of time. It is less likely that the commitment will not be met.\n\n**Related practices:**\n- O.1 Learn about the experience, preferences, and time constraints of participants is a more GENERAL practice.\n\n**Challenges:** Some people will be prevented from pursuing the role, but if there are other forms of contribution it does not prevent them from participating altogether. Assessing potential contributors requires effort.\n\n**Used by:** CM3, CM8, CM10, CM13, CM14\n\nAnother community manager explained how the process can also benefit the mentor:\n\n\u201cEncouraging someone to answer questions on IRC, for example, communicates that you think that they grasp the concepts.\u201d \u2014CM2\n\n4.2.5 Contributor Retention\n\nThe category Contributor Retention contains practices that encourage contributors to return. CM13 explained why R.9 Provide evaluations and a promotion path is a useful retention practice:\n\n\u201cIt is also important to provide episodic volunteers with metric achievement in the community for their time dedicated and tasks completed. They can grow from basic volunteers to representatives, mentors, influential leaders and even employees, motivating results and retention.\u201d \u2014CM13\n\nAnother community manager described an additional benefit for the community:\n\n\u201c[Skills exploration and skill building sessions] can prove helpful as the main goal would be to know what skills episodic volunteers have and what skills they can develop to contribute to more projects (long term or short term).\u201d \u2014CM14\n\nPractice W.10: Encourage learners to mentor\n\n**Context:** Highly active contributors have limited time to mentor episodic contributors.\n\n**Concerns:**\n- 2.C Episodic contributor lacks awareness of opportunities to contribute\n- 4.C Episodic contributor lacks understanding of project vision\n- 8.C Community\u2019s cost of supervision exceeds benefit of episodic contribution\n- 11.C Community lacks an episodic strategy\n\n**Solution:** Engage episodic contributors in leading other episodic contributors. Let them review episodic contributions and mentor episodic contributors. Episodic contributors are likely to understand the concerns and limitations of other episodic contributors. Using returning episodic contributors to lead episodic contributors lets core contributors focus on other areas, and recognizes the competency of returning episodic contributors.\n\n**Related practices:**\n- P.16 Share success stories is a COMPLEMENTARY practice.\n- W.1 Have a key contributor responsible is a COMPLEMENTARY practice.\n- W.8 Mentor to quality is a COMPLEMENTARY practice.\n- R.2 Encourage social connections is a COMPLEMENTARY practice.\n\n**Used by:** CM2, CM5, CM12, CM13\n\n**Proposed by:** CM11, CM16\n\nPractice R.9: Provide evaluations and a promotion path\n\n**Context:** Episodic contributors are unable to develop as contributors. There is sustained episodic participation, and absences do not affect the completion of duties.\n\n**Concerns:**\n- 15.C Community gives episodic contributors reduced access to opportunities and rewards\n\n**Solution:** Provide assessment and opportunities to episodic contributors. Examples of assessment are skill exploration and personal evaluation. Examples of opportunities are travel, employment consideration, succession planning, and skill building. Sustained episodic participants are encouraged to continue contributing and are more beneficial to the community.\n\n**Related practices:**\n- R.10 Promote episodic contributors is a more SPECIFIC practice.\n\n**Used by:** CM13, CM14, CM22\n\n**Proposed by:** CM1\n4.3 Workflows\n\nMany practices are of limited effectiveness if implemented alone. For instance, it would be impossible to implement O.3 Guide people to junior jobs without first implementing P.1 Identify appropriate tasks, but it would also be ineffective to initiate P.1 without planning to advertise it. However, with a wide range of practices, some tuned to specific contexts, there is no single correct way for a community manager to combine practices to achieve a particular goal.\n\nWe asked participants how they might combine practices into a workflow in order to address an important concern. The response to this question can be seen as examples of how community managers approached the task. It is illustrative for other practitioners who wish to understand how to leverage the extensive list of practices that resulted from this study. While it is beyond the scope of this article to identify specific workflows of practices that could be applied to any community\u2014largely due to the fact that communities are only beginning to address EV\u2014the links to related practices within each practice description provide guidance on how community managers have envisioned combining practices.\n\nEach workflow consists of a number of practices, to be implemented sequentially or simultaneously, which together form one possible solution to a specific concern. All workflow diagrams are provided in the appendix [84], available in the online supplemental material.\n\nFig. 2 depicts an example workflow proposed by CM6 to address concern 11.C Community lacks an episodic strategy. The diagram shows the practices P.1 Identify appropriate tasks and W.1 Have a key contributor responsible as COMPLEMENTARY practices because they are not directly connected to each other, but both PRECEDE practice P.10 Keep communication channels active. P.13 Have a social media team also SUCCEEDS P.1 and W.1.\n\nAnother workflow is shown in Fig. 3. It was devised by CM19, and depicts an alternative approach to addressing the same concern. This shows the very individual way in which community managers might join practices to address a concern, based on their own experience and idiosyncratic understanding of their communities.\n\n5 DISCUSSION AND CONCLUSION\n\n5.1 Discussion\n\n5.1.1 Diversity of Practices\n\nIn this study we sought to identify the concerns community managers have about episodic volunteers, and identify the practices that they are using\u2014or envisage using\u2014to address these concerns. To do this we conducted a policy Delphi study of community managers.\n\nWe looked for study participants engaged in different communities, from different countries, and representing communities of different sizes. In order to identify any relationship between responses based on these dimensions, responses were coded with the community name, countries involved, and activities the community manager had experience with. Observed variations in practices based upon any of the dimensions identified are described in the Context field of the full description of practices.\n\nCommunity size was an important factor in how episodic contributors are informed about developments. Smaller communities favored a less formal approach such as P.7 Hold open progress meetings while larger communities recommended O.5 Manage task assignments with an application. Mature communities were more concerned with governance and automation practices such as G.5 Create a community definition of quality, W.5 Automate checking the quality of work, O.5 Manage task assignments with an application, and W.13 Automate process assistance.\n\nCountry was only associated with one difference. Specifically, reimbursement solutions such as G.8 Centralize budgeting of sponsorships and G.9 Use an external provider for sponsorships were more frequently mentioned in less developed countries, regardless of location. However, it is important to note that the context for these practices is participants who need...\nsponsorship, and this situation can arise in any country. FLOSS communities had rather consistent concerns and practices around the world and we were unable to observe any cultural differences. Future work might revisit the earlier studies which suggested culture is a factor in FLOSS participation, to determine if this still holds true.\n\n**Contribution type** produced the greatest amount of diversity in practices. In particular, event organization supplied a number of practices primarily applicable to this context. Software development was another area that stood out as influencing practices. For example, G.3 Host in-person meetings is primarily an event-planning practice, while P.18 Write modular software is clearly specific to software development. Practices specific to one type of work within the FLOSS community were of course less likely to be confirmed than general practices applicable to multiple types of contributions. This may be the reason that some practices, such as P.20 Offer a consistent development environment and P.17 Provide templates for presentations, were not confirmed. Future research could focus on confirming practices for specific aspects of FLOSS work, and on determining the prevalence of their use.\n\n**Gender** was not directly included in our study design, although participants could introduce gender as context to a problem or solution if they considered it relevant. One participant did mention gender, but as a general statement, noting that women are more responsive to recruitment:\n\n> \u201c...in my experience women are more active in volunteering if they find the community responsive. I clearly see the difference in managing gender-related communities and regular communities, that more clearly represent the state of the industry.\u201d \u2014CM24\n\nFLOSS literature suggests that responsive communities are more welcoming to all participants [73], [96], which aligns with the participant\u2019s subsequent statement:\n\n> \u201cMaking the community friendly for women means making it friendly for everyone who is a kind person, because everyone would feel included and involved. [It\u2019s easy to see if this is succeeding, because women are] literally half of the population.\u201d \u2014CM24\n\nOther ways of increasing female participation include appreciation for diverse teams, tracking of female participation, and improved mentoring [59], [67].\n\n**Workflows** show another aspect of variation, less easy to quantify. The work of a community manager is \u201cpeople-centric and versatile,\u201d [97] and it is their implicit and tacit knowledge of their communities which undoubtedly plays a role in determining the construction of a workflow. Future research could try to elicit the factors which go into such decisions.\n\n### 5.1.2 Comparison to Previous Studies\n\nWe identified 65 practices, but we note that this list of practices may not be exhaustive. We compared our findings to an earlier study of onboarding guidelines, which were based on interviews with community managers, diaries of newcomers, and literature [41]. Although their study focused on newcomers, we expected to find overlap because episodic contributors can often only be identified in retrospect [72], not when they join. We also compared our results with our earlier study, where potential practices for managing EV were proposed based on interviews with community managers and the EV literature [6]. Table 5 includes the complete list of practices proposed by the two previous studies, in addition to an overlapping subset of practices from this study.\n\nIn total, nine practices appeared in the other studies which were not found in our study. Two practices were identified from the onboarding study [41], and eight from the earlier EV study [6] (one practice was found in both other studies but not our study). Some of this difference can be explained by variable levels of granularity. For instance, Consider time-based releases could be seen as a specific implementation of R.1 Publicize your release schedule. The different research approaches also explain some of the difference. While the previous EV study provided suggestions based on the EV literature, some of these recommendations, such as Evaluate assets, availability and assignments may not be widely-known or systematically applied in FLOSS communities. Still other practices may have been considered so mainstream that participants did not need to mention them, such as Good documentation. In the end, our study identified 52 practices which were not described in the previous studies, in addition to 13 which were previously described (see Table 5). Our emphasis on identifying practices explains why so many new practices relevant to EV were found. Many of these practices are familiar in the FLOSS domain because community managers are adapting existing practices to the EV context.\n\n### 5.2 Limitations of the Study\n\nThe Delphi method is a qualitative method, and so the traditional criteria used for quantitative studies (such as internal validity, external validity, and reliability) are not appropriate due to epistemological differences. Instead, qualitative research is best evaluated by an alternative set of criteria for naturalistic inquiries proposed by Guba [95]. Guba\u2019s criteria are credibility, transferability, dependability, and confirmability.\n\n**Credibility.** Credibility concerns how plausible, or true, the findings are. Our confidence in the result is strengthened by the fact that the practices were identified iteratively, over a ten month period. This meant that there were many opportunities for participants to reflect on the information which was presented and to amend it. By design, a Delphi study involves member checking during the theory development phase. Preliminary results were also shared with a community manager not involved in the study as an additional form of member checking.\n\n**Transferability.** Guba recommends purposive sampling as a means of ensuring the transferability of the results [95]. We identified three dimensions which the literature suggested might affect our results and created a diverse Delphi study panel. We were able to observe situations where the dimensions limited the applicability of practices, but were also able to identify broadly applicable practices. We were able to differentiate between novel suggestions and practices which are already in use.\n\n**Dependability.** Dependability is strengthened by maintaining an audit trail. We maintained anonymized as well\nas original copies of all responses, including feedback on the collation. We retained a copy of the collation in the state it appeared after each round as well as after feedback was received on the collation. Any supplemental documents developed in creating the collation were also retained in a project repository.\n\n**Confirmability.** There were multiple opportunities for study participants to correct researcher bias. The multiple phases of a Delphi study allow participants to respond to the developing theory; this is a form of member checking. In addition, we reflected our understanding to participants with a personalized report of practices we understood them to have tried or advocated and requested corrections.\n\n### 5.3 Conclusion\n\nThe identification of 65 practices, 52 of which had not been previously described in the context of managing EV in FLOSS, demonstrates that many community managers are actively thinking about how to incorporate EV. Our study confirms that 74 percent of practices we identified are being actively used. This is in contrast to our earlier qualitative survey on the state of EV in FLOSS communities, where we found that community managers were aware of EV but were not taking any specific steps to manage it [6]. Given the nascent state of the literature on EV in FLOSS communities, this study fills a significant gap. We also described the relationships between practices and gave some examples of how practices can be combined to form a workflow. The findings of this study can be readily adopted by FLOSS community managers.\n\nWe further identified 16 concerns that community managers have about EV in their communities, and identified how frequently they were observed by our participants. These concerns were ranked by the expert panel members of this study. The ranked list provides a roadmap for future research as it provides clues as to where researchers and practitioners might direct their energy. Concerns are linked with practices for addressing them, opening the possibility of future studies investigating the effectiveness of different approaches.\n\nWith the collection of practices [84] we have created an extensive guide for managing EV in FLOSS which can be readily understood by researchers and practitioners, which draws upon the experiences of seasoned community managers from a number of different communities, geographic regions, and areas of expertise. To the best of our knowledge, this study is the first that has gathered practices for managing episodic contributors in FLOSS communities. Given the increasing attention for episodic contributors as a phenomenon within the open source literature, we believe this study provides a timely foundation for future work in this area.\n\n### ACKNOWLEDGMENTS\n\nThe authors would like to thank the community mentors who contributed significant time to participate in this study: R. Bowen, N. Bowers, A.-I. Chiuta, S. M. Coughlan, A. El Ach\u00eache, B. \u201cbex\u201d Exelbierd, L. Kisuuki, N. Kolokotronis, G. Lelarge, G. Link, S. Park, Pkpacheco, A. Pinheiro, A. Randal, J. A. Rey, C. Shorter, H. Tabunshchyk, L. Vancsa, H. Woo, S. Zacchiroli, V. Zimmerman, and the participants who preferred to remain anonymous. Additionally, we would like to thank the reviewers for their constructive feedback. Finally, S. B. Segletes provided helpful formatting advice. This work was supported, in part, by Science Foundation Ireland grants 13/RC/2094 and 15/SIRG/3293.\n\n### REFERENCES\n\n[1] K. Nakakoji, Y. Yamamoto, Y. Nishinaka, K. Kishida, and Y. Ye, \u201cEvolution patterns of open-source software systems and communities,\u201d in *Proc. Int. Workshop Princ. Softw. Evol.*, 2002, pp. 76\u201385.\n\n[2] A. Mockus, R. T. Fielding, and J. D. Herbsleb, \u201cTwo case studies of open source software development: Apache and Mozilla,\u201d *ACM Trans. Softw. Eng. Methodology*, vol. 11, no. 3, pp. 309\u2013346, 2002.\n\n[3] K. Crowston, H. Annabi, J. Howison, and C. Masango, \u201cEffective work practices for software engineering: Free/libre open source software development,\u201d in *Proc. Workshop Interdisciplinary Softw. Eng. Res.*, 2004, pp. 18\u201326.\n\n[4] G. Pinto, I. Steinmacher, and M. A. Gerosa, \u201cMore common than you think: An in-depth study of casual contributors,\u201d in *Proc. 23rd Int. Conf. Softw. Anal. Evol. Reengineering*, 2016, vol. 1, pp. 112\u2013123.\n\n[5] A. Lee and J. C. Carver, \u201cAre one-time contributors different? A comparison to core and periphery developers in FLOSS repositories,\u201d in *Proc. Int. Symp. Empir. Softw. Eng. Mes.*, 2017, pp. 1\u201310.\n\n[6] A. Barcomb, A. Kaufmann, D. Riehle, K.-J. Stol, and B. Fitzgerald, \u201cUncovering the periphery: A qualitative survey of episodic volunteering in free/libre and open source software communities,\u201d *IEEE Trans. Softw. Eng.*, 2018. [Online]. Available: http://dx.doi.org/10.1109/TSE.2018.2872713\n\n[7] A. Barcomb, K.-J. Stol, D. Riehle, and B. Fitzgerald, \u201cWhy do episodic volunteers stay in FLOSS communities?\u201d in *Proc. Int. Conf. Softw. Eng.*, 2019, pp. 948\u2013959. [Online]. Available: https://cora.uc.ie/handle/10468/7248\n\n[8] N. Macduff, \u201cSocietal changes and the rise of the episodic volunteer,\u201d *Emerg. Areas Volunteering*, vol. 1, no. 2, pp. 49\u201361, 2005.\n\n[9] F. Tang, N. Morrow-Howell, and E. Choi, \u201cWhy do older adult volunteers stop volunteering?\u201d *Ageing Soc.*, vol. 30, no. 5, pp. 859\u2013878, 2010.\n\n[10] D. A. Harrison, \u201cVolunteer motivation and attendance decisions: Competitive theory testing in multiple samples from a homeless shelter,\u201d *J. Appl. Psychol.*, vol. 80, no. 3, pp. 371\u2013385, 1995.\n\n[11] R. A. Cnaan and F. Handy, \u201cTowards understanding episodic volunteering,\u201d *Vrijwillige Inzet Onderzocht*, vol. 2, no. 1, pp. 29\u201335, 2005.\n\n[12] L. Bao, X. Xia, D. Lo, and G. C. Murphy, \u201cA large scale study of long-time contributor prediction for GitHub projects,\u201d *IEEE Trans. Softw. Eng.*, to be published, doi: 10.1109/TSE.2019.2918536.\n\n[13] J. Gamalielsson and B. Lundell, \u201cSustainability of open source software communities beyond a fork: How and why has the Libreoffice project evolved?\u201d *J. Syst. Softw.*, vol. 89, pp. 128\u2013145, 2014.\n\n[14] M. Foucault, M. Palyart, X. Blanc, G. C. Murphy, and J.-R. Falleri, \u201cImpact of developer turnover on quality in open-source software,\u201d in *Proc. 10th Joint Meeting Found. Softw. Eng.*, 2015, pp. 829\u2013841.\n\n[15] D. Izquierdo-Cortazar, G. Robles, F. Ortega, and J. M. Gonz\u00e1lez-Barahona, \u201cUsing software archaeology to measure knowledge loss in software projects due to developer turnover,\u201d in *Proc. 42nd Hawaii Int. Conf. Syst. Sci.*, 2009, pp. 1\u201310.\n\n[16] M. Zhou and A. Mockus, \u201cWho will stay in the FLOSS community? Modeling participant\u2019s initial behavior,\u201d *IEEE Trans. Softw. Eng.*, vol. 41, no. 1, pp. 82\u201399, Jan. 2015.\n\n[17] M. A. Hager, \u201cToward emergent strategy in volunteer administration,\u201d *Int. J. Volunt. Adm.*, vol. 29, no. 3, pp. 13\u201322, 2013.\n\n[18] N. Macduff, \u201cEpisodic volunteers: Reality for the future,\u201d *Voluntary Action Leadership*, vol. Spring, pp. 15\u201317, 1990.\n\n[19] K. Culp III and M. Nolan, \u201cTrends impacting volunteer administrators in the next ten years,\u201d *J. Volunt. Adm.*, vol. 19, no. 1, pp. 10\u201319, 2000.\n\n[20] L. Hustinx and F. Lammertyn, \u201cCollective and reflexive styles of volunteering: A sociological modernization perspective,\u201d *Voluntas: Int. J. Voluntary Nonprofit Organizations*, vol. 14, no. 2, pp. 167\u2013187, 2003.\n[21] K. A. Smith, K. Holmes, D. Haski-Leventhal, R. A. Cnaan, F. Handy, and J. L. Brudney, \u201cMotivations and benefits of student volunteering: Comparing regular, occasional, and non-volunteers in five countries,\u201d Can. J. Nonprofit Soc. Econ. Res., vol. 1, no. 1, 2010, Art. no. 65.\n\n[22] R. A. Cnaan, H. Daniel Heist, and M. H. Storti, \u201cEpisodic volunteering at a religious megaevent,\u201d Nonprofit Manage. Leadership, vol. 1, no. 1, pp. 1\u201314, 2017.\n\n[23] S. Koch and G. Schneider, \u201cEffort, co-operation and co-ordination in an open source software project: GNOME,\u201d Inf. Syst. J., vol. 12, no. 1, pp. 27\u201342, 2002.\n\n[24] T. T. Dinh-Trong and J. M. Bieman, \u201cThe FreeBSD project: A replication case study of open source development,\u201d IEEE Trans. Softw. Eng., vol. 31, no. 6, pp. 481\u2013494, Jun. 2005.\n\n[25] J. J. Davies, H. V. K. S. Nussbaum, and D. M. German, \u201cPerspectives on bugs in the Debian bug tracking system,\u201d in Proc. 7th Work. Conf. Mining Softw. Repositories, 2010, pp. 86\u201389.\n\n[26] F. Rullani and S. Haefliger, \u201cThe periphery on stage: The intra-organizational dynamics in online communities of creation,\u201d Res. Policy, vol. 42, no. 4, pp. 941\u2013953, 2013.\n\n[27] D. Riehle, P. Riemer, C. Kolassa, and M. Schmidt, \u201cPaid vs. volunteer work in open source,\u201d in Proc. 47th Hawaii Int. Conf. Syst. Sci., 2014, pp. 3286\u20133295.\n\n[28] G. Pinto, L. F. Dias, and I. Steinmacher, \u201cWho gets a patch accepted first? Comparing the contributions of employees and volunteers,\u201d in Proc. 11th IEEE/ACM Int. Workshop Cooperative Hum. Aspects Softw. Eng., 2018, pp. 110\u2013113.\n\n[29] A. Capiluppi, K.-J. Stol, and C. Boldyreff, \u201cExploring the role of community stakeholders in open source software evolution,\u201d in Proc. IFIP Int. Conf. Open Source Syst., 2012, pp. 178\u2013200.\n\n[30] B. Lundell et al., \u201cAddressing lock-in, interoperability, and long-term maintenance challenges through open source: How can companies strategically use open source?\u201d in Proc. IFIP Int. Conf. Open Source Syst., 2017, pp. 80\u201388.\n\n[31] L. F. Dias, I. Steinmacher, and G. Pinto, \u201cWho drives company-owned OSS projects: Employees or volunteers?\u201d in Proc. V. Work. Softw. Vis. Evol. Maintenance, 2017, Art. no. 10.\n\n[32] L. Dahlander and M. G. Magnusson, \u201cRelationships between open source software companies and communities: Observations from Nordic firms,\u201d Res. Policy, vol. 34, no. 4, pp. 481\u2013493, 2005.\n\n[33] P. J. \u00c4gerfalk and B. Fitzgerald, \u201cOutsourcing to an unknown workforce: Exploring opensourcing as a global sourcing strategy,\u201d MIS Quart., vol. 32, no. 2, pp. 385\u2013409, 2008.\n\n[34] G. Von Krogh and S. Spaeth, \u201cThe open source software phenomenon: Characteristics that promote research,\u201d The J. Strategic Inf. Syst., vol. 16, no. 3, pp. 236\u2013253, 2007.\n\n[35] K. Carillo, S. Huff, and B. Chawner, \u201cWhat makes a good contributor? Understanding contributor behavior within large free/open source software projects\u2014A socialization perspective,\u201d The J. Strategic Inf. Syst., vol. 26, no. 4, pp. 322\u2013359, 2017.\n\n[36] C. Jensen and C. Boldyreff, \u201cRole migration and advancement processes in OSSD projects: A comparative case study,\u201d in Proc. 29th Int. Conf. Softw. Eng., 2007, pp. 364\u2013374.\n\n[37] Y. Fang and D. Neufeld, \u201cUnderstanding sustained participation in open source software projects,\u201d J. Manage. Inf. Syst., vol. 25, no. 4, pp. 9\u201350, 2009.\n\n[38] D. Rozas, \u201cSelf-organisation in commons-based peer production, Drupal: \u2018The drop is always moving\u2019,\u201d Ph.D. dissertation, University of Surrey, Guildford, U.K., 2017. [Online]. Available: https://davidrozas.cc/phd\n\n[39] M. Osterloh and S. Rota, \u201cOpen source software development\u2014just another case of collective invention?\u201d Res. Policy, vol. 36, no. 2, pp. 157\u2013171, 2007.\n\n[40] R. Pham, L. Singer, and K. Schneider, \u201cBuilding test suites in social coding sites by leveraging drive-by commits,\u201d in Proc. Int. Conf. Softw. Eng., 2013, pp. 1209\u20131212.\n\n[41] I. Steinmacher, C. Treude, and M. A. Gerosa, \u201cLet me in: Guidelines for the successful onboarding of newcomers to open source projects,\u201d IEEE Softw., vol. 36, no. 4, pp. 41\u201349, Jul./Aug. 2019.\n\n[42] D. Sholler, I. Steinmacher, D. Ford, M. Averick, M. Hoye, and G. Wilson, \u201cTen simple rules for helping newcomers become contributors to open source projects,\u201d PLoS Comput. Biol., vol. 15, no. 9, 2019, Art. no. e1007296.\n\n[43] K. Crowston and J. Howison, \u201cThe social structure of free and open source software development,\u201d First Monday, vol. 10, no. 2, 2005.\n\n[44] K. R. Lakhani, \u201cThe core and the periphery in distributed and self-organizing innovation systems,\u201d Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, MA, 2006.\n\n[45] R. Krishnamurthy, V. Jacob, S. Radhakrishnan, and K. Dogan, \u201cPeripheral developer participation in open source projects: An empirical analysis,\u201d ACM Trans. Manage. Inf. Syst., vol. 6, no. 4, pp. 14\u201345, 2016.\n\n[46] P. Setia, B. Rajagopalan, V. Sambamurthy, and R. Calantone, \u201cHow peripheral developers contribute to open source software development,\u201d Inf. Syst. Res., vol. 23, no. 1, pp. 144\u2013163, 2012.\n\n[47] J. Wang, \u201cSurvival factors for free open source software projects: A multi-stage perspective,\u201d Eur. Manage. J., vol. 30, no. 4, pp. 352\u2013371, 2012.\n\n[48] B. Vasilescu, A. Serebrenik, M. Goeminne, and T. Mens, \u201cOn the variation and specialisation of workload - A case study of the Gnome ecosystem community,\u201d Empir. Softw. Eng., vol. 19, no. 4, pp. 585\u20131008, 2014.\n\n[49] G. Von Krogh, S. Spaeth, and K. R. Lakhani, \u201cCommunity, joining, and specialization in open source software innovation: A case study,\u201d Res. Policy, vol. 32, no. 7, pp. 1217\u20131241, 2003.\n\n[50] L. Dahlander and S. O\u2019Mahony, \u201cProgressing to the center: Coordinating project work,\u201d Organization Sci., vol. 22, no. 4, pp. 961\u2013979, 2011.\n\n[51] C. Amrit and J. van Hillegersberg, \u201cExploring the impact of sociotechnical core-periphery structures in open source software development,\u201d J. Inf. Technol., vol. 25, no. 2, pp. 216\u2013229, 2010.\n\n[52] K. Neuling, A. Hannemann, R. Klamma, and M. Jarke, \u201cA longitudinal study of community-oriented open source software development,\u201d in Proc. Int. Conf. Adv. Inf. Syst. Eng., 2016, pp. 509\u2013523.\n\n[53] A. Capiluppi and M. Michlmayr, \u201cFrom the cathedral to the bazaar: An empirical study of the lifecycle of volunteer community projects,\u201d in Proc. IFIP Int. Conf. Open Source Syst., 2007, pp. 31\u201344.\n\n[54] H. Masmoudi, M. den Besten, C. de Loupy, and J.-M. Dalle, \u201cPeeling the onion,\u201d in Proc. IFIP Int. Conf. Open Source Syst., 2009, pp. 284\u2013297.\n\n[55] G. Von Krogh, S. Haefliger, S. Spaeth, and M. W. Wallin, \u201cCarrots and rainbows: Motivation and social practice in open source software development,\u201d MIS Quart., vol. 36, no. 2, pp. 649\u2013676, 2012.\n\n[56] A. Lee, J. C. Carver, and A. Bosu, \u201cUnderstanding the impressions, motivations, and barriers of one time code contributors to FLOSS projects: a survey,\u201d in Proc. 39th Int. Conf. Softw. Eng., 2017, pp. 187\u2013197.\n\n[57] A. Labuschagne and R. Holmes, \u201cDo onboarding programs work?\u201d in Proc. 12th Work. Conf. Mining Softw. Repositories, 2015, pp. 381\u2013385.\n\n[58] I. Steinmacher, M. A. G. Silva, M. A. Gerosa, and D. F. Redmiles, \u201cA systematic literature review on the barriers faced by newcomers to open source software projects,\u201d Inf. Softw. Technol., vol. 59, pp. 67\u201385, 2015.\n\n[59] S. Balal\u00ed, I. Steinmacher, U. Annamalai, A. Sarma, and M. A. Gerosa, \u201cNewcomers\u2019 barriers... is that all? An analysis of mentors\u2019 and newcomers\u2019 barriers in OSS projects,\u201d Comput. Supported Cooperative Work, vol. 27, pp. 679\u2013714, 2018.\n\n[60] C. Mendez et al., \u201cOpen source barriers to entry, revisited: A sociotechnical perspective,\u201d in Proc. Int. Conf. Softw. Eng., 2018, pp. 1004\u20131015.\n\n[61] S. Bayati, \u201cUnderstanding newcomers success in open source community,\u201d in Proc. 40th Int. Conf. Softw. Eng. Companion Proc., 2018, pp. 224\u2013225.\n\n[62] I. Steinmacher, M. A. Gerosa, T. U. Conte, and D. F. Redmiles, \u201cOvercoming social barriers when contributing to open source software projects,\u201d Comput. Supported Cooperative Work, vol. 28, no. 1/2, pp. 247\u2013290, 2019.\n\n[63] I. Steinmacher, G. Pinto, I. Wiese, and M. A. Gerosa, \u201cAlmost there: A study on quasi-contributors in open-source software projects,\u201d in Proc. 40th Int. Conf. Softw. Eng. Companion Proc., 2018, pp. 985\u20131000.\n\n[64] D. Nafus, \u201c\u2018Patches don\u2019t have gender\u2019: What is not open in open source software projects,\u201d New Media Soc., vol. 14, no. 4, pp. 256\u2013266, 2012.\n\n[65] K. Carillo and J.-G. Bernard, \u201cHow many hawks can hide under an umbrella? An examination of how lay conceptions conceal the contexts of free/open source software,\u201d in Proc. Int. Conf. Inf. Syst., 2015. [Online]. Available: https://dblp.org/rec/conf/icis/CarilloB15\n\n[66] D. Nafus, \u201c\u2018Patches don\u2019t have gender\u2019: What is not open in open source software projects,\u201d New Media Soc., vol. 14, no. 4, pp. 669\u2013683, 2012.\n\n[67] A. Bosu and K. Z. Sultana, \u201cDiversity and inclusion in open source software (OSS) projects: Where do we stand?\u201d in Proc. ACM/IEEE Int. Symp. Empir. Softw. Eng. Mes., 2019, pp. 1\u201311.\n\n[68] D. Izquierdo, N. Huesman, A. Serebrenik, and G. Robles, \u201cOpenStack gender diversity report,\u201d IEEE Softw., vol. 36, no. 1, pp. 28\u201333, Jan./Feb., 2019.\n[68] M. Storey, A. Zagalsky, F. F. Filho, L. Singer, and D. M. German, \u201cHow social and communication channels shape and challenge a participatory culture in software development,\u201d IEEE Trans. Softw. Eng., vol. 43, no. 2, pp. 185\u2013204, Feb. 2017.\n\n[69] M. Burnett, A. Peters, C. Hill, and N. Elarief, \u201cFinding gender-inclusiveness software issues with GenderMag: A field investigation,\u201d in Proc. CHI Conf. Hum. Factors Comput. Syst., 2016, pp. 2586\u20132598.\n\n[70] M. K. Hyde, J. Dunn, P. A. Scuffham, and S. K. Chambers, \u201cA systematic review of episodic volunteering in public health and other contexts,\u201d BMC Public Health, vol. 14, no. 1, pp. 992\u20131008, 2014.\n\n[71] M. K. Hyde, J. Dunn, C. Bax, and S. K. Chambers, \u201cEpisodic volunteering and retention: An integrated theoretical approach,\u201d Nurse Educ. Voluntary Sector Quart., vol. 45, no. 1, pp. 45\u201363, 2016.\n\n[72] L. M. Bryen and K. M. Madden, \u201cBounce-back of episodic volunteers: What makes episodic volunteers return?\u201d Queensland University of Technology, Brisbane, Australia, Rep. no. CPNS32, 2006.\n\n[73] I. Steinmacher, I. Wiese, A. P. Chaves, and M. A. Gerosa, \u201cWhy do newcomers abandon open source software projects?\u201d in Proc. 6th Int. Workshop Cooperative Hum. Aspects Softw. Eng., 2013, pp. 25\u201332.\n\n[74] R. D. Safrit and M. V. Merrill, \u201cManagement implications of contemporary trends in volunteerism in the United States and Canada,\u201d J. Volunt. Adm., vol. 20, no. 2, pp. 12\u201323, 2002.\n\n[75] M. Nunn, \u201cBuilding the bridge from episodic volunteerism to social capital,\u201d Fletcher World Aff., vol. 24, pp. 115\u2013127, 2000.\n\n[76] L. C. P. M. Meijs and J. L. Brudney, \u201cWinning volunteer scenarios: The soul of a new machine,\u201d Int. J. Volunt. Adm., vol. 24, no. 6, pp. 789\u2013799, 2007.\n\n[77] M. Turoff, \u201cThe design of a policy Delphi,\u201d Technological Forecasting Soc. Change, vol. 2, no. 2, pp. 149\u2013171, 1970.\n\n[78] N. Dalkey and O. Helmer, \u201cAn experimental application of the Delphi method to the use of experts,\u201d Manage. Sci., vol. 9, no. 3, pp. 458\u2013467, 1963.\n\n[79] W. T. Weaver, \u201cThe Delphi forecasting method,\u201d The Phi Delta Kappan, vol. 52, no. 5, pp. 267\u2013271, 1971.\n\n[80] H. A. Linstone and M. Turoff, Eds., The Delphi Method: Techniques and Applications, vol. 18. Boston, MA, USA: Addison-Wesley Publishing Company, 2002.\n\n[81] L. E. Miller, \u201cDetermining what could/should be: The Delphi technique and its application,\u201d 2006.\n\n[82] K. Conboy and B. Fitzgerald, \u201cMethod and developer characteristics for effective agile method tailoring: A study of XP expert opinion,\u201d ACM Trans. Softw. Eng. Methodol., vol. 20, no. 1, 2010, Art. no. 2.\n\n[83] M. F. Krafft, K.-J. Stol, and B. Fitzgerald, \u201cHow do free/open source developers pick their tools?: A Delphi study of the Debian project,\u201d in Proc. 38th Int. Conf. Softw. Eng. Companion, 2016, pp. 232\u2013241.\n\n[84] A. Barcomb, K.-J. Stol, B. Fitzgerald, and D. Riehle, \u201cAppendix to: Managing episodic contributors in free/ libre/ open source software communities,\u201d IEEE Trans. Softw. Eng., to be published, doi: 10.1109/TSE.2020.2985093.\n\n[85] C. Okoli and S. D. Pawlowski, \u201cThe Delphi method as a research tool: An example, design considerations and applications,\u201d Inf. Manage., vol. 42, no. 1, pp. 15\u201329, 2004.\n\n[86] K. Q. Hill and J. Fowles, \u201cThe methodological worth of the Delphi forecasting technique,\u201d Technological Forecasting Soc. Change, vol. 7, no. 2, pp. 179\u2013192, 1975.\n\n[87] R. Loo, \u201cThe Delphi method: A powerful tool for strategic management,\u201d Policing: An Int. J. Police Strategies Manage., vol. 25, no. 4, pp. 762\u2013769, 2002.\n\n[88] A. L. Delbecq, A. H. van de Ven, and D. H. Gustafson, Group Techniques for Program Planning: A Guide to Nominal Group and Delphi Processes. Glenview, IL, USA: Scott Foresman and Company, 1975.\n\n[89] A. Carvalho and M. Sampaio, \u201cVolunteer management beyond prescribed best practice: A case study of Portuguese non-profits,\u201d Personnel Rev., vol. 46, no. 2, pp. 410\u2013428, 2017.\n\n[90] Y. Takhteyev and A. Hilts, \u201cInvestigating the geography of open source software through GitHub,\u201d University of Toronto, Toronto, Canada, 2010. [Online]. Available: http://www.takhteyev.org/papers/Takhteyev-Hilts-2010.pdf\n\n[91] J. C. Crotts and S. W. Litvin, \u201cCross-cultural research: Are researchers better served by knowing respondents\u2019 country of birth, residence, or citizenship?\u201d J. Travel Res., vol. 42, no. 2, pp. 186\u2013190, 2003.\n\n[92] Y. Y. Kim, \u201cIntercultural personhood: Globalization and a way of being,\u201d Int. J. Intercultural Relations, vol. 32, no. 4, pp. 359\u2013368, 2008.\n\n[93] V. Braun and V. Clarke, \u201cUsing thematic analysis in psychology,\u201d Qualitative Res. Psychol., vol. 3, no. 2, pp. 77\u2013101, 2006.\n\n[94] D. Riehle, N. Harutyunyan, and A. Barcomb, \u201cPattern discovery and validation using scientific research methods,\u201d Friedrich-Alexander Universit\u00e4t Erlangen-N\u00fcrnberg, Erlangen, Germany, Tech. Rep. CS-2020-01, Mar. 2020. [Online]. Available: https://dirkriehle.com/wp-content/uploads/2020/03/cs-fau-tr-2020-01.pdf\n\n[95] E. G. Guba, \u201cCriteria for assessing the trustworthiness of naturalistic inquiries,\u201d Educ. Technol. Res. Develop., vol. 29, no. 2, pp. 75\u201391, 1981.\n\n[96] V. Singh and W. Brandon, \u201cOpen source software community inclusion initiatives to support women participation,\u201d in Proc. IFIP Int. Conf. Open Source Syst., 2019, pp. 68\u201379.\n\n[97] H. M\u00e4enp\u00e4\u00e4, M. Munezero, F. Fagerholm, and T. Mikkonen, \u201cThe many hats and the broken binoculars: State of the practice in developer community management,\u201d in Proc. 13th Int. Symp. Open Collaboration, 2017, Art. no. 1.\n\nAnn Barcomb received the PhD degree from the University of Limerick, Limerick, Ireland. She is a member of the Open Source Research Group, Friedrich-Alexander University Erlangen-N\u00fcrnberg, Erlangen, Germany and Lero\u2013the Irish Software Research Centre. Throughout her career, she has been active in free/libre/open source software, in particular the Perl community. For more information, please visit ann@barcomb.org.\n\nKlaas-Jan Stol is a lecturer with the School of Computer Science and Information Technology, University College Cork, Cork, Ireland, an SFI principal investigator and a funded investigator with Lero\u2014the Irish Software Research Centre. His research interests include research methodology, and contemporary software development approaches. For more information, please visit k.stol@ucc.ie.\n\nBrian Fitzgerald is director of Lero\u2014the Irish Software Research Centre. He holds an endowed chair, the Frederick Krehbiel II chair in Innovation in Business and Technology, University of Limerick, Limerick, Ireland. His research interests include open source software, inner source, crowdsourcing, and agile methods. For more information, please visit bf@lero.ie.\n\nDirk Riehle received the PhD degree in computer science from ETH Z\u00fcrich, Z\u00fcrich, Switzerland. He is a professor of computer science at Friedrich-Alexander University, Erlangen, Germany. He once led the Open Source Research Group, SAP Labs, Silicon Valley, and founded the Open Symposium (OpenSym). He was the lead architect of the first UML virtual machine, blogs For more information, please visit at http://dirkriehle.com and can be reached at dirk@riehle.org.\n\nFor more information on this or any other computing topic, please visit our Digital Library at www.computer.org/csdl.", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/002-barcomb.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 18, "total-input-tokens": 58849, "total-output-tokens": 24685, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 4609, 1], [4609, 10766, 2], [10766, 17487, 3], [17487, 23822, 4], [23822, 30061, 5], [30061, 34959, 6], [34959, 41360, 7], [41360, 48046, 8], [48046, 52999, 9], [52999, 57589, 10], [57589, 60616, 11], [60616, 65768, 12], [65768, 70286, 13], [70286, 74241, 14], [74241, 80730, 15], [80730, 88045, 16], [88045, 97182, 17], [97182, 104440, 18]]}}
|
|
{"id": "eb98d74d31946e20d17188268a3c01e89a8a75bb", "text": "When and How to Make Breaking Changes: Policies and Practices in 18 Open Source Software Ecosystems\n\nCHRIS BOGART, CHRISTIAN K\u00c4STNER, and JAMES HERBSLEB,\nCarnegie Mellon University, USA\nFERDIAN THUNG, Singapore Management University, Singapore\n\nOpen source software projects often rely on package management systems that help projects discover, incorporate, and maintain dependencies on other packages, maintained by other people. Such systems save a great deal of effort over ad hoc ways of advertising, packaging, and transmitting useful libraries, but coordination among project teams is still needed when one package makes a breaking change affecting other packages. Ecosystems differ in their approaches to breaking changes, and there is no general theory to explain the relationships between features, behavioral norms, ecosystem outcomes, and motivating values. We address this through two empirical studies. In an interview case study, we contrast Eclipse, NPM, and CRAN, demonstrating that these different norms for coordination of breaking changes shift the costs of using and maintaining the software among stakeholders, appropriate to each ecosystem\u2019s mission. In a second study, we combine a survey, repository mining, and document analysis to broaden and systematize these observations across 18 ecosystems. We find that all ecosystems share values such as stability and compatibility, but differ in other values. Ecosystems\u2019 practices often support their espoused values, but in surprisingly diverse ways. The data provides counterevidence against easy generalizations about why ecosystem communities do what they do.\n\nCCS Concepts: \u2022 Software and its engineering \u2192 Collaboration in software development; Software development process management; Software libraries and repositories; \u2022 Human-centered computing \u2192 Empirical studies in collaborative and social computing;\n\nAdditional Key Words and Phrases: Software ecosystems, dependency management, semantic versioning, collaboration, qualitative research\n\nACM Reference format:\nChris Bogart, Christian K\u00e4stner, James Herbsleb, and Ferdian Thung. 2021. When and How to Make Breaking Changes: Policies and Practices in 18 Open Source Software Ecosystems. ACM Trans. Softw. Eng. Methodol. 30, 4, Article 42 (July 2021), 56 pages.\nhttps://doi.org/10.1145/3447245\n\nThis work has been supported by by NSF awards 1901311, 1546393, 1302522, 1322278, 0943168, 1318808, 1633083, and 1552944, the Science of Security Lablet (H9823014C0140), the U.S. Department of Defense through the Systems Engineering Research Center, and a grant from the Alfred P. Sloan Foundation.\n\nAuthors\u2019 addresses: C. Bogart, C. K\u00e4stner, and J. Herbsleb, Carnegie Mellon University, Institute for Software Research TCS Hall 430, 4665 Forbes Avenue, Pittsburgh, PA 15213; emails: {cbogart, ckaestner, jherbsleb}@cs.cmu.edu; F. Thung, Singapore Management University, School of Computing and Information Systems, 80 Stamford Road, Singapore 178902; email: ferdiant.2013@smu.edu.sg.\n\nPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.\n\n\u00a9 2021 Association for Computing Machinery.\n1049-331X/2021/07-ART42 $15.00\nhttps://doi.org/10.1145/3447245\n1 INTRODUCTION\n\nSoftware ecosystems are communities built around shared programming languages, shared platforms, or shared dependency management tools, which allow developers to create packages that import and build on each others\u2019 functionality. Software ecosystems have become an important paradigm for organizing open source software development and maintaining and reusing code packages. Development within ecosystems is efficient in the sense that common functionalities need only be developed, maintained, and tested by a single team, instead of many authors reimplementing the same functionality.\n\nCoordination is a major challenge in software ecosystems, since packages tend to be highly interdependent yet independently maintained [2, 3, 6, 21, 55, 68]. In at least some ecosystems, such as JavaScript, transitive dependency networks are growing rapidly [46]. Improvements that a maintainer makes to a shared package may affect many users of that package, for example, by incorporating new features, making APIs simpler, and improving maintainability [10]. Any of these actions may require rework from developers whose software depends on that package. Package users may invest in regular rework to keep up with changes, collaborate with upstream projects to minimize the impact of those changes, decline to update to the latest versions (at the risk of missing bug fixes or security updates), or replicate functionality to avoid dependencies in the first place [6, 17, 19, 72]. Package maintainers, in turn, have many ways to reduce the burden on their users. For example, they can refrain from performing changes, announce and clearly label breaking changes, or help their users to migrate from old to new versions [6, 36, 65, 67]. Many different practices can contribute to managing change, and adopting various practices can shift some of the cost (in the form of effort) among different classes of ecosystem participants such as maintainers, package users, and end-users (e.g., Reference [28]).\n\nWhile much is known about some individual practices for managing change, we do not yet understand how these practices occur in the wild, nor how they can combine to establish the full design space of practices. Managing change takes time and effort from both upstream and downstream developers, and depending on their community\u2019s practices, this cost may be distributed differently. However, we do not fully understand the distributions of costs that result from various practices, nor how practices are related to ecosystem culture and technologies. This is important not only from a research perspective, to acquire an understanding of ecosystem coordination mechanisms, but also for practitioners and sponsors who may need to tune the distribution of costs to accommodate changing conditions. For example, as an ecosystem accumulates a large and rapidly growing base of applications that use particular packages, its community may wish to adopt practices to increase the stability of those packages to avoid imposing the costs of change on a large and growing base of users. What practices could accomplish this? Of this set of practices, which are likely to be compatible with the adopting ecosystem\u2019s culture and values?\n\nWe perform two studies to address questions like this. First, we conducted a multiple case study (Study 1) of three open source software ecosystems with different philosophies toward change: Eclipse, R/CRAN, and Node.js/npm. We studied how developers plan, manage, and coordinate change within each ecosystem, how change-related costs are allocated, and how developers are influenced by and influence change-related expectations, policies, and tools in the ecosystem. In each ecosystem, we studied public policies and policy discussions and interviewed developers about their expectations, communication, and decision-making regarding changes. We found that developers employ a wide variety of practices that shift or delay the costs of change within an ecosystem. Expectations about how to handle change differ substantially among the three ecosystems and influence cost-benefit tradeoffs among those who develop packages used by others (who we will call upstream developers), the developer-users of such packages (who we will call...\ndownstream developers), and end-users. We argue that these differences arise from different values in each community and are reinforced through peer pressure, policies, and tooling. For example, long-term stability is a central value of the Eclipse community, achieved by their \u201cprime directive\u201d practice of never permitting breaking changes. This practice imposes costs on upstream developers, who may accept substantial opportunity costs and technical debt to avoid breaking client code. In contrast, the Node.js/npm community values ease and simplicity for upstream developers and has a technical infrastructure in which breaking changes are accepted, but signaled clearly through version numbering.\n\nOur second study builds on and expands the scope of the first, investigating the prevalence of practices, and attitudes toward the ecosystems values from Study 1, in a larger set of 18 ecosystems. We combine several methods to accomplish this, including data mining of software repositories to identify those practices that leave visible traces, document analysis to identify policy-level practices that are stated explicitly, and a large-scale survey to ask developers about many other practices as well as the importance of various values within the ecosystem. In Study 2, we find that practices and values are indeed often cohesive within an ecosystem, but diverse across different ecosystems. We also find that even when ecosystems share similar values, they often achieve it in different ways, or sometimes fail to achieve it at all, promoting practices that are never widely adopted or do not work well. Together, our results provide a map of the distribution of values and practices across these ecosystems and allow us to examine the relationships between values and practices. Beyond these findings, we make our full anonymized results available to the research community, in hopes they will be useful in future studies, for example, by providing a basis for selecting cases with particular combinations of practices and values.\n\nThis work builds on and extends our previously published conference paper [6], including much of the material in Section 4. The data is available as an archived dataset [7] as well as an interactive web page.1\n\nOur contributions include a description of breaking change-related values and practices in three ecosystems, a taxonomy of values and of practices, and a mapping of those values and practices across 18 ecosystems derived from a survey, data mining, and policy analysis.\n\n2 CONCEPTS AND DEFINITIONS\n\nSoftware ecosystems. For this study, we define software ecosystems as communities built around shared programming languages, shared platforms, or shared dependency management tools, allowing developers to create packages that import and build on each others\u2019 functionality. In line with definitions of Lungu [50] and Jansen and Cusumano [43], we focus on \u201ccollection[s] of software projects which are developed and which co-evolve together in the same environment\u201d [50, p. 27], which have interdependent but independently developed packages, which generally share a technology platform or a set of standards [43]. Such ecosystems typically center on some means to package, version, and often host software artifacts, and to manage dependencies among them [1, 47, 51, 61, 74].\n\nNote that the term \u201csoftware ecosystem\u201d is overloaded and used with different definitions in different lines of research [52], including ones that focus on commercial platforms that can be enhanced with third-party contributions [40, 56, 81, 83]. We focus especially on open-source communities developing interdependent libraries (e.g., Maven, npm, CPAN), rather than more centralized platforms where usually independent extensions provide a single application but do not build on each other (e.g., Photoshop plugins, Android apps); we also exclude ecosystems that repackage\n\n1http://breakingapis.org.\nsoftware projects and their dependencies for deployment (e.g., Debian packages, homebrew), as they are often managed by independent volunteers rather than the original software developers.\n\n**Breaking changes.** There are many relevant software development concerns when maintaining interdependent artifacts as a community. We focus on the coordination issue of deciding whether and how to perform breaking changes and how downstream developers respond.\n\nIn this article, we define a breaking change as any change in a package that would cause a fault in a dependent package if it were to blindly adopt that change. We thus include not only cases where a change in API would cause a downstream package to fail to compile, but also cases where program behavior would change, leading to incorrect results or unacceptable performance. We examine breaking-change related practices quite broadly, including not only reactions to actual breaking changes, but practices meant to signal, mitigate, or prevent breaking changes.\n\nMaintaining dependencies and updating one\u2019s own code to react to breaking changes is a significant cost driver when using otherwise free open-source dependencies. Breaking changes are common in practice [3, 5, 6, 14, 22, 29, 39, 44, 48, 53, 54, 66\u201368, 89, 90]. For example, Decan et al. [22] found that 5% of package updates in CRAN were backward incompatible, causing 41% of the errors in released dependent packages. Xavier et al. [90] report that 28% of releases of frequently used Java libraries break backward compatibility, with the rate of breaking changes increasing over time. Information hiding [63], centralized change control [29, 73], and change impact analysis [8, 84] can all guide decision making, but cannot entirely prevent the need for breaking changes in practice, given the large-scale, open, and distributed nature of software ecosystems [6, 59, 62, 76, 90].\n\n**Package managers** structure the problem and make dependencies and versions explicit [3, 47, 51], and practices like semantic versioning assign semantics to version numbers (e.g., breaking vs. nonbreaking changes) [65, 67], but these only help to manage change, not prevent the problem or support decision making about when to perform breaking changes.\n\n**Values and practices.** The \u201cwhy\u201d and \u201chow\u201d of managing breaking changes in software ecosystems are values and practices.\n\nShared values\u2014judgments of what is important or preferred\u2014can explain how developers make similar decisions. Values have been studied at societal scale in psychology [4], ethics [16], and related fields [12, 37] (e.g., how education influences personal value systems); however, values and their influence on practices have been studied mostly in narrow contexts in software engineering: Pham et al. studied testing culture [64] and Murphy-Hill et al. found that creativity and communication with non-engineers is valued more by game developers than by application developers, resulting in less testing and architecture practices in game development [58]. We use the concept of values to analyze common shared beliefs about what is important for an ecosystem, with a focus on change-related issues.\n\nWith practices, we refer broadly to activities that developers engage in, again primarily with a focus on managing change. Practices may include specific release strategies, deciding not to perform changes, mitigating the impact of changes through documenting migration paths or reaching out to developers, monitoring changes in dependencies, deciding whether and when to update dependencies, and many more [6].\n\nIn ecosystems, practices may be encouraged or mandated by policies (for example, npm and Eclipse mandate the use of semantic versioning in their documentation) and may be supported or even enforced by tools (for example, the Eclipse community\u2019s API Tools detect even subtle breaking changes and CRAN runs automated checks to enforce coding standards and resolve incompatibility issues) [6]. For simplicity, we use the term practice broadly, including policies and tools.\nGovernance in open source and software ecosystems covers community-wide decisions, e.g., how to integrate third-party contributions [11], which model for decision making is generally appropriate [45, 60], how open an ecosystem should be [85], and how people in different roles should be allowed to participate [86]. While some governance research discusses the need for both evolvability and stability of an organization [83], research focuses on general market mechanisms or process documentation and conformance [41, 45] not on technical steps a software engineer might take.\n\n3 METHODS\n\n3.1 Research Design\n\nAs stated in the introduction, our goal in this research is to create a high-level map of values and practices relating to breaking change across many software ecosystems.\n\nWe approached this question with an exploratory sequential mixed-methods design [15], beginning a qualitative preliminary case study to first understand how the community deals with or prevents breaking changes, and why they deal with them in this way. This first study takes a constructivist view, focusing on how the problem of breaking changes look from the perspective of participants, and asking why they approach this collaboration problem the way they do. We use this to inform a second, primarily quantitative study. The second study is not intended specifically to confirm that the findings generalize (although we do a confirmatory check in Section 5.1), but rather a broad look to see where it generalizes, and if there is any pattern to the combinations of values and practices we see in the larger landscape outside the three case study ecosystems. Study 2 casts a broad net at the cost of depth when asking high-level questions about many communities; however, we recognize and call for research about particular practices, values, or ecosystems that should be followed up in more depth, bringing more resources to bear for more focused questions. Study 2 shows that there is not a simple relationship between practices and values\u2014we found that communities often act on the same value in different ways.\n\n3.2 Study 1: Interview Case Study\n\nFor our first look at ecosystem practices, we performed a multiple case study, interviewing 28 developers in the three ecosystems. Case studies are appropriate for investigating \u201chow\u201d and \u201cwhy\u201d questions about current phenomena [92]. We selected three contrasting cases to aim for theoretical replication [92], a means to investigate the proposition that phenomena will differ across contrasting cases for predictable reasons.\n\nEclipse and Node.js/npm served as cases that contrast sharply in their approach to change: Eclipse has interfaces that have not changed for over a decade, while Node.js/npm is a relatively new and fast-moving platform. We expected that Eclipse\u2019s policies and tools might impose costs on developers in a way that encouraged them to act consistently with the ecosystem\u2019s values of stability. The R/CRAN ecosystem serves as a useful third theoretical replication, since its policy favors compatibility among the latest versions of packages over Eclipse\u2019s long-term compatibility with past versions. In addition, CRAN acts as a gatekeeper for a centralized repository in contrast to npm\u2019s intentionally low hurdles for contributions.\n\nWe began by mining lists of packages and their dependency relationships from these three ecosystems. We assembled a database of packages, their dependency relationships, and version change histories from the npm repository (metadata from which was retrieved from https://registry.npmjs.org/ in json format), CRAN repositories (scraping metadata from web pages starting from http://cran.r-project.org/web/packages/available_packages_by_name.html), and git repositories of Eclipse (https://git.eclipse.org/c/).\nTable 1. Interviewees. R2 and N4 Were Pairs of Close Collaborators, Identified as R2a, R2b, N4a, and N4b\n\n| Code | Case | Field | Occupation |\n|------|------|------------------------|----------------|\n| E1 | Eclipse | Programming tools/HCI | University |\n| E2 | Eclipse | Soft. Eng./CS Education | University |\n| E3 | Eclipse | Soft. Eng./Research | University |\n| E4 | Eclipse | CS Education | University |\n| E5 | Eclipse | Software engineering | Retired |\n| E6 | Eclipse | Software engineering | Industry |\n| E7 | Eclipse | Eclipse infrastructure | Industry |\n| E8 | Eclipse | Software engineering | Industry |\n| E9 | Eclipse | Software engineering | Industry |\n| R1 | CRAN | Soil science | Government |\n| R2a,b| CRAN | Statistics | University |\n| R3 | CRAN | Medical imaging | University |\n| R4 | CRAN | Genetics | University |\n| R5 | CRAN | Soil science | University |\n| R6 | CRAN | Web apps | Industry |\n| R7 | CRAN | Data analysis | Industry |\n| R8 | CRAN | R infrastructure | Industry |\n| R9 | CRAN | R infrastructure | Industry |\n| R10 | CRAN | R infrastructure | University |\n| N1 | NPM | Telephony | Industry |\n| N2 | NPM | Tools for API dev. | Industry |\n| N3 | NPM | Web framework | Startup |\n| N4a,b| NPM | Web framework | Startup |\n| N5 | NPM | Cognitive Science | University |\n| N6 | NPM | Database, Node infrastr.| Startup |\n| N7 | NPM | Database, Node infrastr.| Industry |\n\nAll owned packages with both upstream and downstream dependencies.\n\nWe pursued two complementary recruitment strategies for our interviews. To find package maintainers who would have recent, relevant insight about managing dependencies from both sides of the dependency relationship, we used our mined repository datasets to identify packages that had at least two downstream dependencies and two upstream dependencies, and that both the focal package and at least one of the upstream dependencies had had a version update in the year before the interview (2015).\u00b2\n\nWe emailed a random sample of these packages\u2019 owners choosing at random from the package list mentioned above in small batches, handwriting emails to the authors using emails and details supplied in the npm and CRAN repositories, or the Eclipse commit logs, and set up interviews with people who responded. We also interviewed three developers that we or our colleagues knew personally. In all, we contacted 92 people and conducted 26 interviews. Our interviews focused on their personal practices and experiences managing upstream and downstream dependencies.\n\nAfter 20 interviews, we were hearing similar ideas from each new interviewee but we recognized the need for deeper experience with the ecosystem-wide origins and impacts of the ecosystem\u2019s\n\n\u00b2The code implementing this filtering is available at https://github.com/cbogart/depalyze/blob/1d867cc92d7a5f18274358ae02574915026a30d5/depalyze/versionhistory.py#L354.\npolicies, so we decided to additionally interview individuals with some role (current or historical) in the development of the ecosystem\u2019s tools or policies. As these individuals are fewer and there are more demands on their time, we only attempted to find a few key people in each ecosystem; thus, we recruited 8 additional developers; asking a few of the same questions but also adding questions about the ecosystem\u2019s history, policy, and values. All 28 interviewees were active software developers with multiple years of experience, but their background ranged from university research to startup companies; Table 1 gives an overview.\n\nWe conducted semistructured phone interviews that lasted 30\u201360 minutes. We generally followed an interview script shown in Appendix A, but tailored our questions toward the interviewees\u2019 personal experiences. With the interviewees\u2019 consent, we recorded all interviews.\n\nIn keeping with our constructivist approach to the first study, we analyzed the interviews using Thematic Analysis [9]. We transcribed the recordings, then tentatively coded the transcripts looking for interesting themes, using Dedoose [23], then iteratively discussed, redefined, and recoded. Codes that emerged in the first round included labels such as \u201cexpectations towards change,\u201d \u201ccommunication channels,\u201d \u201copportunity costs of backward compatibility,\u201d and \u201cmonitoring.\u201d We combined redundant codes, eliminated ones that did not recur or address our research questions, then grouped the remainder into seven high-level themes: \u201cChange planning: reasons for changes,\u201d \u201cchange planning: costs to the changer,\u201d \u201cChange planning: Technical means, practices,\u201d \u201cChange planning: reasoning about cost tradeoffs,\u201d \u201cCoping with change,\u201d \u201cCommunication,\u201d and \u201cEcosystem-wide policy and technology.\u201d Next, we gathered tagged quotes from each high-level category, and two researchers checked that they agreed with the low-level tags for each quote in the category, revising any disagreements through discussion.\n\nThematic analysis does not claim to find reproducible phenomena within the interviews; for example, we did not attempt to compute interrater reliability, since we make no claim that two researchers trained themselves to reliably identify exactly the same utterances from interviewees as examples of \u201cexpectations towards change,\u201d nor that we have exhaustively identified all instances of such an expectation among our interviewees. As such, we do not apply statistics to our qualitative results or attach much importance to counts; the purpose of the interviews and our thematic analysis is to discover the broad categories of attitudes and strategies towards change that interviewees experienced, with illustrative examples of typical practices and motivations that constitute those strategies.\n\nTo complement our interviews, we explored policies, public discussions, meeting minutes, tools in each ecosystem.\n\nIn our analysis, we distinguish between decisions made in the roles upstream and downstream developer, as depicted in Figure 1.\n\nValidity check. To validate our findings from the case study, we adapted Dagenais and Robillard\u2019s methodology [18] to check fit and applicability as defined by Corbin and Strauss [13, p. 305]. We presented interviewees with both a summary and a full draft of Sections 4.2\u20134.3, along with\nquestions prompting them to look for correctness and areas of agreement or disagreement (i.e., fit), and any insights gained from reading about experiences of other developers and platforms (i.e., applicability).\n\nSix of our interviewees responded with comments on the results; all six indicated general agreement (e.g., R5: \u201cIt brings a structure and coherence to issues that I was loosely aware of, but that are too rarely the centre of focus in my everyday work.\u201d); corrections included small factual errors, (e.g., the number of CRAN packages had increased since the initial writeup, and is now over 14,000); and suggestions of ways to sharpen our analysis (e.g., R7 noted that CRAN\u2019s policy to contact downstream developers does not apply to the majority of users outside CRAN). We incorporated their feedback when it was consistent with a recheck of our data and added clarifications otherwise.\n\n3.3 Study2\n\nWe then conducted a systematic mapping of values and practices in a broad sample of ecosystems, primarily making use of a survey. Because of the large number and diversity of practices (Tables 4, 5, and 6), we could not measure them all with one methodology. We asked about a large subset of them in the survey (e.g., doing research about dependencies before using them; bottom section of Table 6). We also analyzed documentation and policies to identify practices that are enacted ecosystem-wide by organizations or tools (e.g., Ecosystem-wide synchronized release; Table 4); finally, we mined Github repositories and the libraries.io package metadata dataset for practices that leave visible traces (e.g., \u201cContinue critical updates to older versions\u201d; Table 5). Out of the 55 practices we identify, there are 19 that we do not attempt to measure in Study 2 (e.g., socially connected developers following each other on Twitter, going to conferences; top section of Table 6).\n\nFirst, we describe the survey methods, then in subsequent subsections describe the policy analysis (Section 3.3.5) and data mining (Section 3.3.6) methods.\n\n3.3.1 Ecosystems. We solicited survey participants from ecosystems with a dependency network structure, in which packages can depend on other packages and a standardized infrastructure helps with sharing and compatibility. We started with a list of software repositories from Wikipedia\u2019s \u201cSoftware Repository\u201d page and added additional ecosystems with an active community that we could find.\n\nWe excluded ecosystems with a flat structure where packages depend only on a single shared platform (e.g., Android) and ecosystems obviously too small to hope to get at least a few dozen responses. We also excluded ecosystems if they were different enough that it was not possible to write clear questions that would apply across ecosystems. This excluded, for example, operating-system-level package managers such as apt, rpm, and brew, and scientific workflow engines.\n\nWe conducted the survey with 31 ecosystems. For our analysis, we somewhat arbitrarily set the minimum number of participants for each ecosystem at 15, feeling this would give us a reasonable claim to some breadth in the responses. This led us to exclude 13 ecosystems: C++/Boost, Bower, Perl 6, Smalltalk, Tex/CTAN, Julia, Clojure/clojars, Meteor, Wordpress, SwiftPM, PHP\u2019s PEAR, Racket, and Dart/pub, leaving us with 18 ecosystems for our analysis, shown in Table 2. All but 2 had more than 40 complete responses.\n\n3.3.2 Survey Goals and Recruitment. The survey consisted of 108 questions: seven long free text questions (marked as optional opportunities for clarification), three short text questions (ecosystem, package name, and gender), and the rest multiple-choice scales. After an informed consent screen, participants first were asked to choose an ecosystem in which they had published or used a package (they could choose from a list, or type in another; we grouped rare answers as \u201cother\u201d for analysis).\n3.3.3 Recruitment. We invested in significant outreach activities to recruit participants for the survey. First, we created a web page and Twitter account to describe the state of current research in this area, in a form easily accessible to practitioners. We encouraged readers of the web page to take the survey to contribute additional knowledge about values in ecosystems. Second, we attended community events, including npm.camp 2016, to talk to developers and community leaders from multiple ecosystems about our research; as a result, several prominent community members tweeted about our web page and survey, resulting in surges of responses (CRAN and npm particularly). Third, we promoted our web page and the survey in ecosystem-specific forums and mailing lists to \u201cdevelopers who write <ecosystem> packages,\u201d hoping that our web page would spark interest in the topic. We also posted on Twitter with hashtags appropriate for different ecosystems. Finally, for 21 ecosystems in which our outreach activity did not yield sufficient answers, we solicited individuals directly by email. We sent 8,137 emails to package authors. We sampled these from authors of packages culled from libraries.io for targeted ecosystems.\n\nParticipants and their demographics. We succeeded in recruiting 2,321 participants to partially or fully complete the survey between August and November of 2016. Of this number, 932 completed the survey; however, we put value questions near the beginning, so there are 1,466 answers to those questions. Statistical analysis of answers to early questions did not reveal any systematic differences between people who completed the survey and those who did not (mean difference between answers to 65 Likert-scale questions between respondents who completed the survey and\n\n---\n\n3https://breakingapis.org.\nthose who did not, was 0.13 scale points (out of 4 or 5, depending on the question). The maximum difference was .83 scale points; but the maximum difference among questions where more than one \u201cincomplete\u201d respondent answered was .54 Likert-scale points. Since the partial responses were similar to full responses, we include data for the incomplete responses.\n\nTo correct for careless responses in which people appeared to be answering many questions without careful consideration, we excluded as \u201ccareless\u201d those sections of a person\u2019s response in which they rated all items exactly the same. We performed this test on eight sections of the survey, and the number of excluded blocks ranged from 11 (for a set of upstream practices) to 76 (for a set of downstream practices). When people were excluded from one block, their responses to other questions did not appear to be outliers (mean difference between answers to 65 Likert-scale questions between respondents excluded from some other block, and respondents who were not, was 0.15 scale points (out of 4 or 5, depending on the question). The maximum difference was .50, for the question \u201cHow important do you think the following values are to the <ecosystem> community: stability\u201d). Because the answers were similar for all questions, we did not exclude entire people if they were apparently careless in any of the eight blocks.\n\nTable 2 shows participation by ecosystem. Participants averaged 8.8 years of development experience, 7.2 years in open source, and 4.6 in the ecosystem they answered about. Slightly more than half (59%) had college degrees in CS. The most frequently claimed role in the ecosystem was package lead developer (59%); Others ranged from the 8.5% who claimed a role in the founding or core team of the ecosystem, to 11% who only drew on ecosystem packages for their own projects. The average age was 33, with 152 18\u201324-year-olds, and 6 over 65. Of those who gave their gender, 95.9% identified themselves as male, 3.2% as female, and 0.8% gave another gender. These demographic proportions are quite similar to a contemporaneous Github community survey [31].\n\n3.3.4 Survey Design. Our goal in the survey was to investigate the prevalence of values and practices across as many ecosystems as was feasible. We asked a larger number of questions than is typical for a survey of this sort. Long surveys often have reduced completion rates, however, we mitigated this by keeping the questions diverse and hopefully interesting to the participants, and by putting the questions we were most interested in up front. As a result, we got a reasonably high completion rate (40%) and partial completion rate (62% for value questions at the beginning) considering the length of the survey, resulting in an encouragingly rich and deep dataset. In this article, we focus on describing the values and practices responses, but additional data is available in the accompanying data release [7].\n\nValues. To explore as complete a list as possible of values relevant to managing change, we began with values derived from our interviews in Study 1. We then searched each of the web pages of all our candidate ecosystems for clues of other potential values. For example, \u201cfun\u201d is mentioned as an explicit value in the Ruby community; in an interview Ruby founder Matsumoto said, \u201cThat was my primary goal in designing Ruby. I want to have fun in programming myself\u201d [82]. Note that some values initially seem not directly related to breaking change, but we included them if we thought they could indirectly influence breaking change practices. For example, we expected that perhaps if some practices are more efficient, but less rewarding to carry out, then a \u201cfun\u201d-valuing ecosystem might avoid them.\n\nWe assembled a list of 11 values with the following descriptions:\n\n- **Stability**: Backward compatibility, allowing seamless updates (\u201cdo not break existing clients\u201d).\n- **Innovation**: Innovation through fast and potentially disruptive changes.\n\u2022 **Replicability:** Long-term archival of current and historic versions with guaranteed integrity, such that exact behavior of code can be replicated.\n\n\u2022 **Compatibility:** Protecting downstream developers and end-users from struggling to find a compatible set of versions of different packages.\n\n\u2022 **Rapid Access:** Getting package changes through to end-users quickly after their release (\u201cno delays\u201d).\n\n\u2022 **Quality:** Providing packages of very high quality (e.g., secure and correct).\n\n\u2022 **Commerce:** Helping professionals build commercial software.\n\n\u2022 **Community:** Collaboration and communication among developers.\n\n\u2022 **Openness** and Fairness: Ensuring that everyone in the community has a say in decision-making and the community\u2019s direction.\n\n\u2022 **Curation:** Selecting a set of consistent, compatible packages that cover users\u2019 needs.\n\n\u2022 **Fun** and personal growth: Providing a good experience for package developers and users.\n\nIn the survey, we asked participants about the **perceived values** of the community\u2014\u201cHow important do you think the following values are to the <ecosystem> community?\u201d We used a seven-point rating scale, adapted from Schwartz\u2019s value study [71]: \u201cextremely important,\u201d \u201cvery important,\u201d \u201cimportant,\u201d \u201csomewhat important,\u201d \u201cnot important,\u201d \u201ccommunity opposes this value,\u201d and \u201cI don\u2019t know.\u201d The first five options were separated visually from the last two to make clear that only the former were designed to approximate regular intervals (as recommended by Dillman et al. [27]).\n\nIn addition, we asked participants a similar value question on the same scale about their **own values** with respect to a single package they worked on in the ecosystem. To encourage participants to think about concrete work that they are doing, we asked for the name of a specific package that they worked on and used that package in the question: \u201cHow important are each of these values in development of <package> to you personally?\u201d\n\nRecognizing that despite taking values from multiple sources, we may not have captured all values relevant to managing change, we asked survey participants in an open-ended question about other values important to their ecosystem. Their answers are summarized in Section 5.2.\n\n**Practices.** The practices part of the survey asked about many software-engineering practices, many of which we mention throughout our analysis (Tables 4, 5, and 6); the full list and exact phrasing of our questions can be found in Appendix B. Surveyed practices encompassed the participant\u2019s personal practices and experiences with respect to documentation, support, timing, and version numbering for releases, selecting packages on which to depend, and monitoring dependencies for changes. These were asked, as appropriate, either on an agreement Likert scale as above or on a frequency scale from \u201cnever\u201d to \u201cseveral times a day.\u201d A subset of 15 questions relating to communication with developers of downstream packages were skipped for participants who indicated that they did not maintain a package used by others. To limit the length of the survey, we focused primarily on questions that cannot be answered or are difficult to answer by mining software repositories or reading explicit policy documents (see \u201cM\u201d and \u201cP\u201d labels in Tables 4, 5, and 6) in the Study 2 Methods column.\n\n**Survey analysis.** 483 participants (21%) gave an answer to at least one of the seven optional free-response questions; 11 people gave answers to all seven. We used a grounded approach to analyze answers to the question about other values: one researcher performed open coding to identify a set of candidate codes, then two researchers iteratively combined and revised these to achieve a consensus set of codes and to apply them to the responses.\n\n**Layout of Figures.** Figures 2, 3, and 4 were drawn by eliminating skipped or \u201cdon\u2019t know\u201d values, merging \u201cNot important\u201d with \u201copposed to this value\u201d answers, and drawing a violin plot, with a\ndiamond symbol at the mean position. The violin bodies are smoothed, so the image portrays the mean and a rough distribution.\n\nFor Table 10, we wanted to derive a ranking of the importance of the values in each ecosystem and provide an indication of the consensus around the ranking. The method we adopted calculates highest ranked values for each ecosystem by identifying, for each person in the ecosystem, their highest rating of any of the 11 values, then incrementing a count for all values that person assigned that highest rating to. This has the effect of counting the number of people who ranked each value as the highest while accounting for ties. The table lists the values with the three highest counts, and the consensus numbers are as described in the caption.\n\n3.3.5 Policy Analysis Method. We examined each ecosystem\u2019s online presence and summarized their sanctioned practices. Practices of the ecosystems were derived from documentation pages within each language\u2019s and repository\u2019s websites, specifically seeking out documentation about how to define a package and submit it to the repository, as these documents typically communicate policies to authors in a clear, actionable way. The columns of the table were defined as follows:\n\n- **Dependencies outside repository.** Standard tools in all but two ecosystems (Stackage and LuaRocks) allow developers to additionally specify packages that are not part of the standard repository, for example by a reference to a GitHub repository or an alternate specialized site. We checked the documentation for each package manager\u2019s syntax about how\nto declare dependencies, to see if there was a way to specify a URL for a package not formally in the repository. We marked these as having the feature if it could be specified directly as a URL; as \u201calternate repo\u201d if this could be accomplished only through an alternate repository, or a custom server that mimics the repository\u2019s API.\n\n- **Central Repository.** This captures whether the ecosystem supplies packages in a central repository or simply provides an index to author-hosted download sites.\n\n- **Access to dependency versions.** This denotes whether ecosystem documentation recommends (through examples in the documentation page) for packages to refer to dependencies by version number, or to simply assume the latest version of a dependency is desired (R/CRAN and Go). In two cases (Stackage and Bioconductor), a set of mutually compatible versions is provided to be used together as a set.\n\n- **Gatekeeping Standards.** Ecosystem repositories vary in the amount of vetting of the packages they include. We determined this by looking at the submission requirements for packages. An open circle in the table means that no more than cursory metadata such as name of the package and list of dependencies are required; a closed circle means that platform tools or volunteers perform some deeper investigation of the package: vetting of the submitter, automated or manual tests (of the package or of other packages that depend on it), or virus checks. Two were marked as \u201cstaged releases,\u201d because submissions are tested collectively along with a cohort of packages being released simultaneously.\n\n- **Synced Ecosystem.** This simply denotes whether ecosystem packages (or some important subset) are released all at once on a regular, synchronized schedule.\n\n### 3.3.6 Data Mining.\n\nWe mined data from two sources to capture data about the prevalence of seven additional practices.\n\nFirst, the list of packages to query was derived from the libraries.io ([libraries.io/data](https://libraries.io/data)) cross-ecosystem package index. Libraries.io lists versions, their release dates, dependencies with their version constraints, and their source repositories. It was only available for a subset of our 18 ecosystems (Atom, R/CRAN, Perl/CPAN, Ruby/Rubygems, Rust/Cargo, Python/Pypi, NuGet, Maven, PHP/Packagist, Node.js/NPM, Erlang, Elixir/Hex). Partial information was available for CocoaPods.\n\n---\n\n4Recommendations have evolved since 2016 for Go: see [https://blog.gopheracademy.com/advent-2016/saga-go-dependency-management/](https://blog.gopheracademy.com/advent-2016/saga-go-dependency-management/).\nTable 3. Ecosystem Statistics\n\n| Ecosystem | Founded | Num. Pkgs | Avg. deps | >3 deps | >0 deps |\n|----------------------------|---------|-----------|-----------|---------|---------|\n| Atom (plugins) | 2014 | 4,424 | 1.2 | 10.0% | 38.2% |\n| CocoaPods | 2001 | 14,493 | 0.4 | 1.7% | 21.1% |\n| Eclipse (plugins) | 2001 | 14,954 | 6.4 | 55.7% | 100% |\n| Erlang,Elixir/Hex | 2013 | 1,304 | 1.0 | 5.3% | 50.5% |\n| Go | 2013 | 76,632 | 10.6 | 57.1% | 88.3% |\n| Haskell (Cabal/Hackage) | 2003 | 8,593 | 6.4 | 57.9% | 91.6% |\n| Haskell (Stack/Stackage) | 2012 | 1,337 | 8.3 | 65.0% | 93.9% |\n| Lua/Luarocks | 2007 | 966 | 0.8 | 5.7% | 34.7% |\n| Maven | 2002 | 114,404 | 2.1 | 20.6% | 41.8% |\n| Node.js/NPM | 2010 | 229,202 | 5.6 | 49.8% | 81.2% |\n| NuGet | 2010 | 66,486 | 1.6 | 11.4% | 58.3% |\n| Perl/CPAN | 1995 | 31,641 | 7.6 | 56.5% | 79.6% |\n| Python/PyPi | 2002 | 65,622 | 0.2 | 2.0% | 8.1% |\n| PHP/Packagist | 2012 | 63,860 | 3.1 | 28.1% | 82.7% |\n| R/Bioconductor | 2001 | 1,104 | 4.9 | 48.9% | 74.2% |\n| R/CRAN | 1997 | 7,922 | 2.9 | 27.9% | 86.7% |\n| Rust/Cargo | 2014 | 3,727 | 2.1 | 20.1% | 71.5% |\n\nPackagedependency and founding year data for ecosystems. #Pkgs = number of packages in the repository we checked as of January 2016; Avg. deps = average number of dependencies sampled packages had; >3 deps = percentage of packages having more than three dependencies. >0 deps = percentage having any dependencies.\n\nand Hackage, but not dependencies. Dependency counts for Bioconductor, Hackage, Stackage, Lua, Eclipse, and CocoaPods were scraped from their respective repository websites. We did not find Go dependencies listed centrally in any repository, so we extracted this information from World of Code [57], a massive mirror of GitHub, GitLab, Bitbucket, and other open source software repositories, indexed and searchable in ways that make it more convenient for data mining than GitHub\u2019s APIs allow. One data product World of Code provides is dependencies of packages, parsed from source code files; we used this to count Go dependencies. Table 3 shows that packages in the ecosystems are interdependent, but in widely differing degrees.\n\nBeyond package counts and dependencies, further information about these packages was queried about packages in all ecosystems from World of Code [57].\n\n- **Dependency Version Constraints.** We ran pattern-matching on the dependency constraints of all packages in libraries.io, for packages released during 2016 and flagged for each package whether it used a particular type of constraint on any one or more of its dependencies at any time during the year. Note that percentages add up to over 100%, since a package may use more than one kind of dependency constraint.\n - **Exact:** Dependency version is constrained by a fully specified version number, such as 1.3.2.\n - **Min only:** Version constraints such as >1.3.2, or use of conventions like caret (^) in npm that has the same effect (e.g., ^1.3 is the same as >= 1.3.0).\n - **Range:** Constraints with a minimum and maximum version, like >1.3.2,<2.0; or use of conventions like tilde (~) in npm that has the same effect (e.g., ~1.3.2 means >=1.3.2,<2.0).\n\u2014 **Unconstrained**: The dependency name is specified with no version constraints; either the constraint is blank or some symbol like \u201c*\u201d is used.\\(^5\\)\n\nFor a more fine-grained analysis of version constraints across many ecosystems, see Dietrich et al. [26].\n\n- **Lock files.** Using World of Code [57], we examined files committed during 2016 in each of the ecosystem\u2019s packages, looking for references to a lock file, which specifies exact versions of all dependencies, direct and transitive (i.e., dependencies of dependencies). These differ by ecosystem and vary in how canonical their use is. The filenames we used in this search are shown in Table 11 in Appendix D. Including a lock file in an end-user distribution of a program makes it more likely the program will run correctly, since it preserves the exact versions of dependencies that the program was tested on. However, developers including many dependencies in their own projects may prefer not to specify the exact versions of all their transitive dependencies, since they may be in conflict with each other, and they have the means and opportunity to resolve any conflicts themselves (then perhaps locking in a consistent set of dependencies when producing a release for their own users) [78].\n\n- **Maintaining old versions.** Making bug fixes to outdated versions of code, or even backporting new features, can be helpful for users who cannot update to the cutting-edge versions for some reason. We define prior-version maintenance operationally as simply any release whose version number is smaller than expected and hence out of sequence: For example, if a sequence of releases was \u201c2.0.1,\u201d \u201c2.0.2,\u201d \u201c1.5.3,\u201d \u201c2.0.3,\u201d then we identify \u201c1.5.3\u201d as a likely bugfix or backported feature introduced in 2.0.1 or 2.0.2, introduced as a courtesy to those users currently using 1.5.2 who choose not to upgrade to the 2.0 series. Specifically, this measure captures the percentage of packages in each ecosystem whose version number ever decreased in 2016, per data from Libraries.io.\n\n- **Cloning.** We measured the percentage of packages in each repository whose projects borrowed a file in 2016 from another package. We did this by building a list of SHA hashes of files (blobs) associated with each commit in each project in the ecosystem through World of Code [57], and looking for overlaps. We count a project as having cloned a file, if a commit incorporates a blob over 1 kb in 2016 that was previously seen in some other package in the ecosystem. We only considered blobs derived from other packages in the ecosystem\u2019s repository, not ones derived from projects in the broader realm of open source. We chose to count these within-repository clones specifically, since the developer could have tried to use the ecosystem\u2019s dependency management system to incorporate the desired code by reference, but chose not to. Previous research has also mapped cloning behaviors [33, 49].\n\n### 3.4 Threats to Validity\n\nWe chose our methods carefully to answer our research questions, and the survey in particular differs from a more typical statistically focused survey technique. We therefore describe the threats to the validity of the study before presenting the results, so readers can have these in mind as they read our findings.\n\nAs described, Study 1 used case selection criteria [92] appropriate for contrasting cases, but they may not be typical of all ecosystems, and so one needs to be careful when generalizing beyond the three cases. Our results may be affected by a selection bias, in that developers who did not want to be interviewed may have had different experiences. Finally, the differences we found among cases\n\n---\n\n\\(^5\\)Note that this weighs most heavily the state of packages for which more versions were released or that had more dependencies.\nmay be confounded with the reasons we selected them, such as their popularity or the availability of data about them.\n\nAs for Study 2, as is typical of surveys in our field, our survey sample is not truly random; there may be selection bias relating to who we were able to reach via the venues we chose. We tried to mitigate this by recruiting from forums, Twitter, and direct e-mail. The survey was also quite long (and was advertised as such up front). People with less patience for long surveys, or less interest in questions of breaking changes, values, and practices, may have self-selected out. This could be significant if people with impatience for long surveys also have different software-engineering practices and beliefs.\n\nAnother possible concern is that respondents may apply different standards in their ratings. For example, if the expectation of stability is extremely high in a particular ecosystem, then participants may rate the perceived importance of stability lower, because they are applying a very stringent standard for how focused everyone should be on stability. A similar focus on stability in a different ecosystem might lead to participants in that ecosystem to rate the importance of stability higher. We tried to mitigate this by requiring at least 15 participants for each ecosystem, which should give some breadth of experience behind the responses.\n\nWhile we tried to avoid using terminology that differed among ecosystems, we were not always successful. For example, the word \u201csnapshot\u201d means different things in different ecosystems\u2019 practices, which caused some confusion. Even the term \u201cbreaking change\u201d may be interpreted differently; for example, they might define it more narrowly as a change that simply would cause downstream compilation to fail, while we intended it to also include changes that would cause wrong behavior in downstream software.\n\nRespondents may also have given answers to a few questions influenced by social desirability. For example, they may have felt obliged to say that \u201cquality\u201d is extremely important because that is the \u201cright\u201d answer, or that people follow certain practices because they are what they know to be expected. Our mitigation approach was ensuring confidentiality of responses and avoiding, to the extent possible, questions with clear desirable and undesirable responses.\n\nWe had difficulty recruiting sufficient participants from smaller ecosystems, such as Perl 6 or Clojure; small ecosystems may have different characteristics than large ones. We do have two small ecosystems, Stackage and Lua, and they are outliers in some ways. So, further exploration of small ecosystems, for example with interviews or analysis of artifacts, should be a priority for future work.\n\n4 STUDY 1: QUALITATIVE MULTIPLE-CASE STUDY\n\nIn Study 1, we investigated the decision-making involved in making breaking changes, and practices they adopt to ease the burden:\n\nRQ1.1: How do developers make decisions about whether and when to perform breaking changes and how do they mitigate or delay costs for other developers?\n\nWe also wanted to see how developers responded to breaking changes that affected them:\n\nRQ1.2: How do developers react to and manage change in their dependencies?\n\nFinally, we wanted to know whether developers perceived tensions between platform policies and their intended effects:\n\nRQ1.3: Did platform policies or tools ever have unintended consequences?\n\n4.1 Case Overview\n\nTo understand the identified different practices and policies, it is important to understand the purpose and history of each ecosystem. In the following, we provide a brief description of all\nthree ecosystems and their values, informed by both public documentation and our interviews. Platform-level features or practices relevant to breaking change are identified in Table 4.\n\n4.1.1 Eclipse. The Eclipse foundation publishes more than 250 open source projects. Its flagship project is the Eclipse IDE, created in 2001. The IDE is built from the ground up around a plugin architecture, which can be used as a general purpose GUI platform and in which plugins can depend on and extend other plugins. Projects can apply to join the Eclipse foundation through an incubation process in which their project and practices come under the Eclipse management umbrella. It is also common practice to develop both commercial and open-source packages separately from the foundation, and publish them in a common format on a third-party server. In addition, the \u201cEclipse marketplace\u201d is a popular registry, listing over 1,600 external Eclipse packages that can be installed from third-party servers through a GUI dialog.\n\nThe Eclipse foundation coordinates a \u201csimultaneous release\u201d of the Eclipse IDE once a year and (as of 2016) three \u201cupdate releases\u201d for new features in between. Many external developers align with those dates as well.\n\nThe Eclipse foundation is backed by corporate members, such as IBM, SAP, and Oracle. Its policies are biased toward backward compatibility; packages (e.g., commercial business solutions) developed 10 years ago will often still work in a current Eclipse revision without modification.\n\nA core value of the Eclipse community is backward compatibility. This value is evident in many policies, such as \u201cAPI Prime Directive: When evolving the Component API from release to release, do not break existing Clients\u201d [25]. Although not entirely uncontroversial (as we will explain), this value was confirmed by many interviewees.\n\n4.1.2 R/CRAN. The Comprehensive R Archive Network (CRAN) has managed and distributed packages written in the R language since 1997. R is an interpreted language designed for statistics. The R language itself is updated approximately every six months, but new development snapshots are available daily. R has multiple repositories with different policies and expectations, including Bioconductor and R-Forge; we focus on CRAN, the largest one. CRAN formally exists under the umbrella of the R Foundation, but sets its own policies.\n\nCRAN contains over 8,000 packages. Of these, 29 are either required or \u201crecommended,\u201d and are bundled in binary installs. About 2,200 more are cataloged as useful for 33 different specializations such as finance and medical imaging. Distributing R software as a CRAN package gives it high visibility, since installation from CRAN is automated in the command-line version of R and the popular IDE RStudio [69].\n\nR and CRAN are used by many developers without a formal computer-science or programming background. CRAN pursues snapshot consistency in which the newest version of every package should be compatible with the newest version of every other package in the repository. Older versions are \u201carchived\u201d: available in the repository, but harder to install. When a new package version is submitted to CRAN, it is evaluated by the CRAN team\u2019s partly automated process. The package must pass its own tests and must not break the tests of any downstream packages in CRAN that depend on it without first alerting those package\u2019s authors so they can make corresponding fixes. Package owners need to react to changes in the platform or in upstream packages within a few weeks, otherwise their package may be archived.\n\nA core value of the R/CRAN community is to make it easy for end-users to install the most up-to-date packages. Although not explicitly represented in policy documents, this value was apparent from many interviews; for example, R10 said, \u201cCRAN primarily has the academic users in mind, who want timely access to current research.\u201d\nTable 4. Platform and Community-level Practice Choices: Who: (P)latform, (U)pstream, (D)ownstream, (3) Third party; Study 2 Method: (P)olicy Analysis, (S)urvey, (M)ining\n\n| Who | Study 2 Method | Practice |\n|-----|----------------|----------|\n| P | P | Existence of centralized repository or directory of packages |\n| P | P | Mechanism for referring to dependencies distributed outside official repositories (e.g., via github directly) |\n| P | P | Make historical versions of package easy or difficult to rely on |\n| P | P | Mechanism to remove or reassign unmaintained packages (e.g., maintainers do not respond to emails) |\n| P | S | Releasing changes on a fixed, advertised schedule per package |\n| P | S,P | Ecosystem-wide synchronized release |\n| P | P | Repository personnel check standards of submitted code before making available on the repository |\n| P | | Allow multiple versions/only one version of a package to be loaded at the same time |\n| P/U | | \u201cStability attributes\u201d (in Rust) saying which API points will not change |\n| P | | Use nightly unstable builds to get exciting new features (at cost to compatibility for downstream users) |\n| P | | Disallow wildcard dependencies |\n| P | | Test compiler changes against all published software using it to prevent breaking things |\n| P | | Constrained rules about version numbering (e.g., cargo disallowing wildcards) |\n| 3 | P | Third-party curation of sets of useful packages or compatible versions |\n| P | | Dynamic language feature to help backward compatibility (optional parameters in R) |\n| P | | Centralized testing infrastructure for all packages |\n| P | | Vulnerability tracking (e.g., Node security platform) |\n| U | S | Private arrangement among package authors to release at the same time |\n\nFor ecosystem-by-ecosystem breakdown of policies, see Section 5.\n\n4.1.3 Node.js/npm. Node.js is a runtime environment for server-side JavaScript applications released initially in 2009, and npm is its default package manager. npm provides tools for managing packages of JavaScript code and an online registry for those packages and their revisions. The npm repository contains over 250,000 packages with rapid growth rates.\n\nThe Node.js/npm platform has the somewhat unusual characteristic that multiple revisions of a package can coexist within the same project. That is, a user can use two packages that each require a different revision of a third package. In that case, npm will install both revisions in distinct places and each package will use a different implementation.\n\nA core value of the Node.js/npm community is to make it easy and fast for developers to publish and use packages. In addition, the community is open to rapid change. Ease for developers was one of the principles motivating the designer of npm [75]. Therefore, npm explicitly does not act as a gatekeeper; it does not have review or testing requirements; in fact, the npm repository contains a large number of test or stub packages. The focus on convenience for developers (instead of end-users) was apparent in our interviews.\n\n4.2 Study 1 Results: Planning Changes (RQ1.1)\n\nWe first discuss managing change from the perspective of a developer planning to perform changes that may affect downstream users. While we observed similar forces and concerns regarding\nchange across all three ecosystems, we observed differences in how the community values affect the ways package maintainers mitigate or delay costs for downstream users.\n\n4.2.1 Breaking Changes: Reasons and Opportunity Costs. Although breaking changes to APIs are costly to downstream users in terms of interruptions and rework, our interviewees gave many reasons why they had to perform such changes; there are corresponding opportunity costs that arise when deciding not to perform the change, such as the cost of maintaining obsolete code, working around known bugs, or postponing desirable new features.\n\nObvious and expected reasons for breaking changes included requirements and context changes and rippling effects from upstream changes. Beyond that, we found surprisingly frequent mentions of stylistic and performance reasons, as well as difficult bug fixes.\n\nTechnical debt. Surprisingly, 12 interviewees (E3, E9, R1, R3, R4, R5, R6, R7, R8, N1, N7) mentioned concerns about technical debt, rather than bugs, new features, or rippling upstream changes, as the trigger for breaking changes. By technical debt, we refer to code that is functionally sufficient but has outstanding stylistic issues developers want to fix, such as poorly chosen object models or method names, lack of extensibility or maintainability, or little-used or long-deprecated methods.\n\nWe conjecture that the reason interviewees brought up these kinds of changes so often in discussion was because they had thought about them in depth. Technical debt often arises from the tension between tools and practices that encourage developers to preserve backward compatibility (e.g., Eclipse\u2019s \u201cprime directive\u201d), versus general pressure for evolution and improvement. Developers often postpone breaking changes until the technical debt becomes intolerable; for example, E3 mentioned as the reason for planning to finally remove some deprecated code: \u201cWhat we did there was to provide old methods as deprecated. But that gets quite messy. At one point almost half of the methods were deprecated.\u201d E9 similarly told us about an upcoming long-postponed major version change: \u201csince we don\u2019t do it often, probably once every five years, [...] let\u2019s take advantage of that opportunity to do some of the things that would be good that we couldn\u2019t do before.\u201d\n\nOld interfaces can come to seem old fashioned and unattractive in a swiftly changing community. Three interviewees said they made breaking changes for syntactic reasons: to harmonize syntax (R1) or improve \u201cweird\u201d or \u201cbad\u201d names (R3, R4) in their interfaces. N7 talked about adopting a new JavaScript programming paradigm that was far more attractive: N7: \u201cYou can\u2019t just stay on that old stuff for forever, it\u2019s just not going to work. And so we drastically rewrote the internals at the transport to be a stream, because that\u2019s sort of, essentially what it is, right? Like, it\u2019s a little stream that takes logs and sends them places.\u201d However, four interviewees (E1, E5, E6, R6) talked about the consequences when not being able to make such changes, i.e., having to preserve old interfaces over long periods, caused opportunity costs, since it hindered attracting new developers, lured by cutting-edge things. E6, for example, told us that: \u201cIf you have hip things, then you get people who create new APIs on top of that in order to [for example] create the next graphical editing framework or to build more efficient text editors. These things don\u2019t happen on the Eclipse platform anymore.\u201d\n\nEfficiency. Four interviewees (E6, R1, R4, N1) reported cases in which efficiency improvements required breaking changes. For example, N1\u2019s package offered an API for requesting paged data that the server could not provide efficiently; they deprecated and eventually removed that function rather than spending money on hardware.\n\nBugs. Bug fixes were another reason for breaking changes (E4, E7, R7, R9). Bug fixes can break downstream packages if those packages depend on the actual (broken) behavior instead of the intended behavior. A lack of well-defined contracts in most implementations makes assigning blame and responsibilities difficult in practice. As E5 told us, \u201cIf someone likes the broken\nsemantics, then they\u2019re not going to like the fixed semantics.\u201d Thus, even fixing an obvious mistake in code under the control of a single person can require significant coordination among many people.\n\nThroughout our interviews, we heard many examples of how bug fixes effectively broke downstream packages, and the difficulty of knowing in advance which fixes would cause such problems. For example, R7 told us about reimplementing a standard string processing function and finding that it broke the code of some downstream users that depended on bugs that his tests had not caught. R9 commented on the opportunity cost of not fixing a bug in deference to downstream users\u2019 workarounds for it: \u201cIf the [downstream package] is implemented on the workaround for your bug, and then your fix actually breaks the workaround, then you sort of have to have a fallback \u2026 [pause] It gets nasty.\u201d\n\n4.2.2 Dividing and Delaying Change Costs. Our previous discussion already hinted that there is flexibility regarding who bears the costs of a breaking change. For instance, a package\u2019s developer can decide between making a breaking change, pushing costs for rework to maintainers of downstream packages; or not making the change, accepting opportunity costs such as technical debt. Even when deciding to make the change, the developer faces strategic choices about whether to invest more effort when making the change to reduce the interruption and rework costs for downstream users as well as to affect timing of when those costs are paid (Table 5). For example, by documenting how to upgrade, the developer invests more effort to reduce effort for downstream maintainers. Different developers and different communities have different attitudes toward who should pay the costs of a change and when, as we will show.\n\nAwareness of Costs to Downstream Users. Almost all (24 out of 28) of our interviewees stated that, when possible, they avoid breaking changes that would affect downstream users. Reasons included looking out for their users\u2019 best interests and knowing that costs to affected users would come back to them, as users ask for help adapting to the change, ask for the change to be reverted, or seek alternative packages. Two interviewees (E1 and R4) specifically mentioned concern for downstream users\u2019 scientific research (R4: \u201cWe\u2019re improving the method, but results might change, so that\u2019s also worrying\u2014it makes it hard to do reproducible research\u201d).\n\nInterviewees\u2019 concern for impacts on users was tied to the size and visibility of the user base and the perceived importance and appropriateness of their usage. Nine interviewees across all ecosystems (E4, E5, E6, R1, R4, R6, R7, R9, N7) were aware of their users and were concerned specifically about the number of users affected and the quantity of complaints that a change would imply, e.g., R9: \u201cI wanted to rename it to something that more specifically describes that this is actually a new V8 context, but, you know, I can\u2019t because so many packages are already importing the new context function.\u201d N1: \u201cwe happen to know that paging is not the feature that was [\u2026] often used from Node module customers\u201d Another npm developer said, N7: \u201c\u2026that was strictly a breaking change for [feature], and so we really didn\u2019t want to break all the community [feature]. Like, we didn\u2019t want all 700 of these to give out \u2018the code you\u2019re using, you have to upgrade\u2026Good luck, bro.\u201d An R/CRAN developer said, R7: \u201cI\u2019m very cautious about making changes to it, and then when I make changes I often regret it. Even for a small change on a package used by a lot of people, it improves 90% of people\u2019s lives, but makes 10% of people\u2019s lives worse, and 1% complain, which, with [package] can be a lot of people.\u201d Three interviewees (E1, R4, R8) noted that their sensitivity toward avoiding breaking changes grew with experience and with a growing user base, as they learned from feedback received about earlier breaking changes.\n\nOf course some developers also themselves work on such downstream packages. Four of our interviewees mentioned doing so (E5, N4, N7, R6) (see discussion in Section 4.3.1); these are presumably aware of the impact of the changes they make to their own other packages.\nOnly four developers were not particularly worried about breaking changes. Three (E6, N1, N5) had strong ties to their users and felt they could help them individually (N5: \u201cWe try to avoid breaking their code\u2014but it\u2019s easy to update their code\u201d). Interviewee N6 expressed an \u201cout of sight, out of mind\u201d attitude: \u201cUnfortunately, if someone suffers and then silently does not know how to reach me or contact me or something, yeah that\u2019s bad but that suffering person is sort of [the tree] in the woods that falls and doesn\u2019t make a sound.\u201d\n\nFinally, developers described tradeoffs in fixing mistakes that downstream users had come to depend on. E8 talked about being stuck with a poor design \u201cIf you make a mistake in your API [\u2026] sorry, you\u2019re stuck with it, so you have to kind of work around it.\u201d R9 mentioned circumstances where users depended on buggy behavior, but the upstream code had to be fixed anyway: \u201cAfter upgrading the parser some people complained that their script was no longer working. But the problem was that their syntax was invalid to begin with. It\u2019s obviously their fault.\u201d\n\nTechniques to Mitigate or Delay Costs. Despite a strong general preference for avoiding breaking changes, there are many cases where the opportunity costs of not making a change are too high. Our interviewees identified several different strategies for how they, as package maintainers, routinely invest effort to reduce or delay the impact from their changes for downstream users.\n\nMaintaining old interfaces. Across all ecosystems, preserving the old interface alongside a new one is a very common approach to mitigate an immediate impact of a change on downstream users. While specifics depend on the language and tools, common strategies to avoid breaking downstream implementations include documenting methods as deprecated and providing default implementations for new extension points or parameters. In these strategies, the package developer invests additional effort now to preserve backward compatibility, accepting technical debt in the form of extra code to maintain for some time, in exchange for preventing an immediate downstream impact of the change. The developer may at some later time clean up the code, affecting downstream users that have not updated in the meantime [68].\n\nSimilarly, many interviewees (E2, E3, E5\u2013E8, R1, R6\u2013R9, N1, N7) told us about various techniques to perform changes without breaking binary compatibility. They prevent rework costs for existing users by accepting more complicated implementations and harder maintenance in the changed package, while possibly also creating costs for new downstream users who have to deal with more complicated mechanisms.\n\nParallel Releases Seven developers (E5, E6, R1, R2, R4, R7, R8) reported strategies to maintain multiple parallel releases, such that downstream developers can incorporate minor nonbreaking changes (e.g., bug fixes) without having to adopt major revisions. Node.js/npm\u2019s caret operator allows package authors to support parallel releases with different version numbers: An author can publish an update 1.0.1 to their version 1.0.0, even after 2.0.0 has been released; users who wish to stay with the 1.* series but still receive updates may refer to version ^1 or ^1.x to receive anything less than 2.0.0. It is a common practice to provide security patches including for older releases. In contrast, CRAN only supports sequential version numbering, causing some developers to fork their own packages (e.g., reshape2 was introduced as backward incompatible revision to reshape). However, R8 told us this is discouraged by CRAN: R8: \u201cBecause <package>2, it\u2019s the second version of <package>, at what point can you just freeze an API and leave it there, and\n\n---\n\n6https://docs.npmjs.com/misc/semver.\n7Current npm security alerts are listed at https://www.npmjs.com/advisories.\n8e.g., https://www.npmjs.com/advisories/1482.\n9According to https://cran.r-project.org/web/packages/policies.html, \u201cUpdates to previously-published packages must have an increased version.\u201d\njump n+1 version and just continue with that? I think there\u2019s some lingo in [CRAN\u2019s instructions for package authors] that they\u2019d rather not have that.\u201d In each case, the fact that they are adding code to multiple versions suggests that developers are investing significant additional effort to reduce the (immediate) impact on downstream users. For example, N1 told us that they were conservative about making major new versions, since their package \u201chas changed major version numbers a lot over last few years, many things backported to earlier versions; irritating to do major revisions every couple of months.\u201d\n\nA variant of this strategy is to maintain separate interfaces for different user groups with different stability commitments within the same package (see the fa\u00e7ade pattern in Reference [30]). For example, interviewee E5 provided in parallel both a detailed and frequently changing API for expert users and a simpler and stable API that insulated less sophisticated users from most changes. Similarly, interviewee R1 has split packages into smaller packages, with the intention that each user could depend only on parts relevant to them and would be exposed to less change. In both cases, the developer accepts the higher design and maintenance costs of multiple APIs for reduced impact on specific groups of users with distinct needs.\n\n**Release Planning.** Individual developers and communities may take consideration of downstream users by planning when to release changes. R1 keeps versions of his package with a quickly changing API in a separate repository and batches multiple updates together in CRAN less frequently when he wants to release a version to a broader audience. While in R/CRAN and Node.js/npm packages are released by individuals whenever they want, the core packages of the Eclipse community coordinate around synchronized yearly releases\\(^\\text{10}\\) (a strategy also common in other package systems such as Debian\\(^\\text{11}\\) and Bioconductor\\(^\\text{12}\\)). Delaying releases may incur coordination overhead and opportunity costs in slowing down development for the changer, but reduces the frequency (though not necessarily the severity) with which downstream users are exposed to changes and gives downstream users a planning horizon.\n\n**Communication with users.** Finally, developers communicate in various ways with users to reduce the impact of a breaking change. Seven interviewees (E6, R4, R7, R8, R9, N6, N7) made early announcements to create awareness and receive feedback. R7 explained that \u201ctwo weeks or a month before the actual release, I do sort of a pre-release announcement on Twitter [and] tell people to use the README.\u201d He told us during the validation phase that he has since written a script to email all downstream maintainers before a release.\n\nAnother reason for communicating with downstream users was to help them deal with the aftermath of change. In the simplest case, a developer could invest effort in documenting how to upgrade. Nine interviewees (E7, R2, R3, R7\u2013R9, N1, N4, N5) mentioned being aware of their users personally, and could reach out to them individually; for example, N1 contacted users who were still using an old API, to help them migrate, and N5 had most users present on-site and could therefore help them migrate their code. E7 went so far as to create individual patches for all downstream packages within the Eclipse core to get them to adopt a new interface and move away from an old deprecated one. In all these cases, package maintainers invest effort to reduce costs for downstream users.\n\n**4.2.3 The Influence of Community Values.** The previously discussed techniques are mechanisms that developers can use for tweaking who pays for the costs of a change and when. Individual developers often adopt patterns and, in fact, six interviewees (E1, R3, R4, R5, R8, N6) described gradual\n\n---\n\n\\(^{10}\\)https://wiki.eclipse.org/Simultaneous_Release.\n\n\\(^{11}\\)https://www.debian.org/doc/manuals/debian-handbook/sect.release-lifecycle.ro.html.\n\n\\(^{12}\\)According to https://www.bioconductor.org/developers/package-submission/, \u201cThere are two releases each year, around April and October.\u201d\nTable 5. Practices (Mostly Upstream) to Communicate and Mitigate Effects of Change\n\n| Who | Study 2 Method | Practice |\n|-----|----------------|----------|\n| U | S | Freeze APIs to protect downstream users from change |\n| U | | Release a major change as a new package name, rather than a new version |\n| U | | Mark API points as deprecated to warn of future removal |\n| U | | Remove deprecated API points eventually |\n| U | | Parallel releases to protect users who do not want to upgrade |\n| U | S | Release changes in a batch rather than as they are made, to make less churn for users |\n| U | S | Write new code as backward compatible, possibly at the cost of incurring technical debt |\n| U | S | Proactively notify users about upcoming changes |\n| U | S | Assist users who are having trouble upgrading to a new version with a breaking change |\n| U | S | Write a migration guide to help users upgrade |\n| U | S | Write a change log to document compatibility problems with prior releases |\n| U | S | Use semantic versioning to signal the kinds of changes being made |\n| U/P | S | Platform rules requiring package authors to negotiate compatibility before releasing (snapshot consistency) |\n| U | M | Continue critical updates to older versions, to give users a way to avoid an expensive major upgrade |\n| U/P | | Ways to check that APIs have not changed, e.g., API tools, @since tags, documentation |\n\nadoption of more formal processes over time, as they learned their value through experience. At the same time, we could clearly observe that attitudes and practices differ significantly among the three ecosystems and are heavily influenced by ecosystem values, tools, and policies.\n\nEclipse. Developers are willing to accept high costs and opportunity costs to further Eclipse\u2019s value of backward compatibility, especially for core packages. The community has developed educational material explaining Java\u2019s binary compatibility and giving recommendations for backward compatible API design [24, 25]. With API Tools, the community has developed sophisticated tool support to detect even subtle breaking changes and enforce change-related policies, such as adding @since tags to API documentation. Breaking changes in core packages are in fact very rare [38].\n\nEven though they arguably make the platform harder to learn and maintain, Eclipse developers have identified and documented [25, part 3] workarounds for extending an interface while maintaining old interfaces, such as creating additional interfaces to avoid modifying existing ones (e.g., IDetailPane2, IDetailPane3, IHandler2) and runtime weaving. Deprecating interfaces and methods is common, but actually removing them is not; for example, like many other methods, org.eclipse.core.runtime.Plugin.startup() as of this publication was still included despite being deprecated for over 15 years. E6 noted that this backward compatibility prevents modernizing APIs, such as replacing arrays with collections.\n\n13https://www.eclipse.org/pde/pde-api-tools/.\n14e.g., a guide published by the Eclipse foundation about evolving APIs says that, \u201cObsolete API elements should be marked as deprecated and point new customers at the new API that replaces it, but need to continue working as advertised for a couple more releases until the expense of breakage is low enough that it can be deleted.\u201d [25].\n15This method was deprecated in 2004: https://github.com/eclipse/eclipse.platform.runtime/commit/a46e757a1938edb0a7109dafef349c3a3ffc58ea and was still present in 2020: https://github.com/eclipse/eclipse.platform.runtime/blob/9aedff3f2141631a8bc5fa6d1abe005ea633f107/bundles/org.eclipse.core.runtime/src/org/eclipse/core/runtime/Plugin.java.\nThe Eclipse community invests significant effort into release planning, at the cost of some resulting friction, as reported by multiple interviewees. E9: \u201cEclipse has a release process, and some projects have to release at the same time as the platform, some projects the day after, some projects the day after, [so] you\u2019re expected to be available a little bit before, so you can make sure that yours bills properly right? [...] So, that\u2019s kinda a complexity.\u201d The required coordination is invested toward ensuring stability and smooth transitions at few plannable times for downstream users. An Eclipse release is a complex process with steps aimed at maintaining not only technical interoperability with prior versions, but also maintaining a consistent level of legal compatibility, usability standards, security, and so on.\\footnote{https://wiki.eclipse.org/Development_Resources/HOWTO/Release_Reviews.} This culture of conservative change contrasts with what, for example, an R developer told us: R7: \u201cOn one hand I try to be careful, but on the other hand I don\u2019t want to inflict harm and be like paralyzed by the fact that anything I do might make someone\u2019s life worse. Sometimes you have to be like go ahead and accept that things are going to break and it\u2019s not the end of the world.\u201d\n\nIn Eclipse, maintenance releases for old major revisions are not common (Table 7); presumably because with backward compatibility users can simply be told to update to the latest release.\n\nR/CRAN. As the R/CRAN community values making it easy for users to get a consistent and up-to-date installation, developers invest significant effort to achieve consistency.\n\nThere is no policy against CRAN packages making changes that affect the larger body of code outside of CRAN. However, when changes affect other CRAN packages, upstream developers are asked to bear the significant extra cost of reaching out to and coordinating with maintainers of affected packages\\footnote{https://cran.r-project.org/web/packages/policies.html#Submission} (termed \u201cforward impact management\u201d by De Souza and Redmiles\\cite{DeSouza2019}). Downstream maintainers then may also bear the cost of pressure to update their packages first before the upstream developer can make a breaking change, to ensure that all CRAN packages are consistent. CRAN\u2019s policy requires (and verifies) that developers maintain constant synchronization with each other, and 5 of our 10 interviewees (R2, R3, R7, R8, R9) specifically mentioned reaching out individually to known, downstream developers (in contrast to three Node.js interviewees (N1, N4, and N5) and one Eclipse interviewee (E7)). Synchronization is thus continuous, but more decentralized and localized than with Eclipse\u2019s simultaneous releases.\n\nAmong our interviewees, five developers of specialized R packages targeted small and close communities and knew their users personally. For example, R3 mentioned that \u201cno one used\u201d a feature, and when asked how they knew that, they replied that \u201cstatisticians working on a lot of medical imaging [...] type of applications in R is a very small community. There\u2019s only so many people to know.\u201d R3 said he got to know those users because of interactions about the dependency. Only a one of our Node and Eclipse interviewees (E6) mentioned personal connections with downstream users, but our sample is too small to be sure this is not just sampling bias.\n\nConsistency is enforced by manual and automated checks on each package update.\\footnote{https://cran.r-project.org/web/packages/policies.html#Submission.} The change management process is collaborative but also demanding of a maintainers time; R7 said the timeline to adapt to an upstream change \u201cmight be a relatively short timeline of two weeks or a month. And that\u2019s difficult for me to deal with because I try to sort of focus one project for a couple weeks at a time, just so I can remain productive.\u201d Node developers, in contrast, can ignore changes until they feel like updating (N5: \u201cWhy don\u2019t we upgrade more often? It\u2019s more work than you\u2019d hope.\u201d), while Eclipse developers rarely need to worry about change (e.g., E1: \u201cWhen a new version comes\nout every year in July or whenever, I\u2019d go ahead and test if my plugin works correctly in that new version; if it does, I don\u2019t care much about that. [...] New features were mostly irrelevant. I didn\u2019t care that much about that.\u201d\n\nThe platform is not conducive to multiple parallel releases\u2014on CRAN a package revision must have a higher version number than the one it supersedes, so an old major version cannot be updated; policies also discourage forking a project and submitting it with a separate name. There is no central release planning, perhaps because it is perceived to slow down access to cutting-edge research.\n\nOverall, we observed much more communication and coordination with downstream users about individual changes than in Eclipse, but also more flexibility with regard to performing breaking changes.\n\nNode.js/npm. The Node.js/npm community values ease for upstream developers and the possibility to move fast [75]. It is much less demanding for a developer to make a breaking change. Six of the Node.js interviewees talked about the importance of signaling change through semantic versioning.\n\nThis sharply contrasts with the R developers we asked about this: two R interviewees spoke out against semantic versioning; for example, R7: \u201cI\u2019m familiar with the semantic versioning stuff. It\u2019s just I don\u2019t find that useful personally, because most R users aren\u2019t familiar with that and I think [convention] is a little bit on the ridiculous side. [...] For most R users I don\u2019t think version numbers send a terribly strong signal, and they are likely to not know what version they are using currently anyway.\u201d\n\nSemantic versioning in Node allows developers to make breaking changes as long as they clearly indicate their intentions. Because the technical platform allows downstream developers to still easily use the old version without fearing version inconsistencies, breaking changes do not as easily cause rippling effects or immediate costs for downstream users. While they still avoid breaking changes and employ various strategies to maintain old interfaces, in our interviews, Node.js/npm developers were generally willing to perform breaking changes in the name of progress and in fighting technical debt, including experimenting with APIs until they are right. For example, N6 told us that if a downstream user was concerned about a breaking change: \u201cI could tell this person, well look if you have this problem at least for now your workaround is very simple. Change your dependency to be this exact dependency so instead of saying we depend on package foo version *. Change it to just exactly that version [...], and you will still be using the old one that you know and love. And that will postpone your problem until the day that you need some new thing that\u2019s come out which is no longer backported into the old version. [...] So knowing that, I do kind of feel kind of confident enough to just say yeah we\u2019re gonna bump the major version, we\u2019re gonna announce or whatever that takes, but I don\u2019t really myself feel too much desire to kind of read for the backward compatible people.\u201d\n\nAs mitigation strategy, maintenance releases for old versions are common, made easy by the platform and associated tools. Analyzing the npm repository, we found that 24 of the 100 most \u201cstarred\u201d packages did this at least once; this was more common than in Eclipse or R/CRAN (Table 7).\n\nSummary of RQ1.1 results: Developers are motivated to change code for many reasons, such as requirements and context changes, bugs and new features, rippling effects from upstream changes, and technical debt from postponed changes. There are also opportunity costs from\n\n19https://cran.r-project.org/web/packages/policies.html#Submission.\nforgoing or postponing changes. Opposing this motivation is their awareness of costs to down-\nstream users of such changes, especially when their userbase is large and visible to them; in\nmost cases developers want to avoid imposing those costs on users. Their choice is not binary,\nhowever; there are ways of softening the impacts of change, such as maintaining old interfaces,\nmaking parallel releases, and making and communicating plans about upcoming changes. De-\nvelopers weigh these choices differently depending on the ecosystem\u2019s values: Eclipse core\npackage developers are discouraged heavily against change, and thus opt for techniques to al-\nlow strictly backward-compatible additions. R/CRAN developers are not officially discouraged\nfrom making changes, but they are aware that the ecosystems rules (no parallel releases, onus\non downstream users to update) are burdensome for downstream users, so they emphasize com-\nmunication and collaboration in their updates. Node.js/npm developers are encouraged to make\nchanges, by mechanisms that signal downstream users about changes, yet insulating them from\nthe requirement to adopt the changes; as a result upstream developers are quite likely to opt\nfor change, and to police each others\u2019 rigorous use of the signaling mechanisms for change\n(semantic versioning).\n\n4.3 Study 1 Results: Coping with Upstream Change (RQ1.2)\n\nJust as upstream developers have some flexibility in planning changes that may affect downstream\ndevelopers, downstream developers have flexibilities regarding whether, when, and how to react\nto upstream change, again influenced by values, policies, and technologies (Table 6). Having to\nmonitor and react to upstream change can be a significant burden on developers (e.g., mismatch\nbetween schedules has been shown to be a barrier to collaboration [42]). The urgency of reacting\nto change can depend significantly on the development context and platform mechanisms.\n\nWhen discussing how frequently they react to upstream change, our interviewees described a\nspectrum ranging from never updating (E3) to closely monitoring all changes in upstream packages\n(N1, N2, R9). Two interviewees mentioned explicitly ignoring certain upstream changes (N3, N7);\nothers upgraded dependencies only at the time of their own releases (N3, N5) or during deliberate\nhouse-cleaning sweeps (N7, E2). Even when the platform does not require updates, developers\noften prefer to update their dependencies to incorporate new fixes and features (E3, N2) or to avoid\naccumulating technical debt (R6, N5). But they avoid updating when updates require too much\neffort (e.g., by causing complicated conflicts; N5, E3) or cause too much disruption downstream\n(N7).\n\n4.3.1 Monitoring Change. When developers have to or want to react in a timely fashion to up-\nstream changes, they need to monitor the upstream projects in some way. The platform itself, e.g.,\nNode.js, R core, and the CRAN infrastructure, is often an additional source of changes that devel-\nopers need to keep up with. In our interviews, we discovered many different strategies for moni-\ntoring, including technical and social strategies. Their strategies varied along with the urgency of\ntheir needs, from active monitoring of upstream activity, to general social awareness of upstream\nactivities, to a purely reactive stance where developers wait for some kind of notifications.\n\nActive monitoring. Only four interviewees (E5, R9, N1, N4) reported actively monitoring up-\nstream changes, in the sense of maintaining personal awareness of upstream changes, by regu-\nlarly looking at activity going on in their upstream dependencies. R9, N1, and N2 said they used\nGitHub\u2019s notification feed with some regularity (N2 only for changes to the Node.js platform, not\nto upstream packages). N4 kept up by following Twitter feeds, blogs, and attending conferences.\nR7 indicated that raw notification feeds, in their current form, are a significant burden with a low\nsignal to noise ratio, saying that \u201cThe quantity of notifications I get on GitHub [on my own project] already is to the point of overwhelming. So I don\u2019t even mostly read them unless I\u2019m actually working on the project at that moment.\u201d He later told us that after our interview he tried scaling back to watching just the three to five projects he is actively working on. Only one interviewee (R9) did not feel overwhelmed, saying that occasional skimming of GitHub feeds was useful way to get an overview of activity.\n\n**Upstream participation.** In seven cases, developers mentioned monitoring upstream changes not as outsiders following a stream of data, but as active participants in those projects, collaborating to influence them toward their own needs (E5, N4, N7, R6) or providing direct contributions to those packages (E7, E9, R7). For example, in describing the challenge of getting upstream projects to prioritize changes that he needed, an Eclipse developer said, \u201cI touch everything that I care about, because it\u2019s really hard to convince other people to do things that I need to do. I find it much easier to just learn all the projects and when I need something, to do it myself.\u201d This aligns with de Souza and Redmiles\u2019 observation of exchange of personnel as a common strategy for cooperation among dependent projects[19]. Such developers wear hats in both projects: They maintain active awareness of the upstream project, as downstream developers, and as upstream developers, their downstream work informs their understanding of the upstream project\u2019s requirements.\n\nOthers like E5 actively compiled and tested their project with development versions of upstream dependencies, emphasizing the importance of giving timely reactions: \u201cif you report it within a week there\u2019s a better chance the developer might remember what they did [\u2026] which provides a good chance that they can revert their change before they hit their milestone.\u201d\n\n**Social awareness.** Many interviewees tried to maintain a broad awareness of change through various social means. The most frequently mentioned mechanism, especially in the Node.js community, was Twitter (E9, R7\u2013R9, N2, N3, N4a, N4b, N6, N7). For example, N4a commented, \u201cthe people who write the actual software are fairly well connected on Twitter, [\u2026] like water cooler type of thing. So we tend to know what\u2019s going on elsewhere.\u201d In each ecosystem, interviewees (E5, R9, N4, N6) mentioned the importance of face-to-face interactions at conferences for awareness about important changes in the ecosystem. Other mentioned social mechanisms to learn about change were personal networks (R6, R8), blogs (E1, R4, R7, R8, N4, N7), and curated mailing lists (N1).\n\n**Reactive monitoring.** Although our research questions led us to probe interviewees about the aforementioned active and social monitoring practices, a reactive strategy is also possible for dependencies. That is, rather than maintain some awareness and understanding of plans and activity in an upstream project, for example, by watching a Github feed and keeping track of why they follow each project and which changes might be relevant to them, a developer may instead ignore upstream projects\u2019 activity until they are given actionable evidence that their own project needs to adapt in some way. The developer waits to hear about problems from others (in advance, or after things had broken): Upstream developers contacting them about breaking changes, failing tests after dependency updates, or platform maintainers warning of changes that would affect them. There are tools that enable this reactive stance, that generate targeted notifications on certain kinds of changes. The specific tools differ among the platforms and support different practices or policies. Policies and common practices (e.g., testing practices) in the platform strongly in turn affect the reliability of a reactive strategy and corresponding tools.\n\nFour developers (R3, E5, N2, and N7) mentioned the use of continuous integration to detect compile-time issues caused by breaking changes in upstream packages early. The tools gemnasium [32] and greenkeeper [35] allowed Node.js/npm developers to get notifications about new\nTable 6. Practices (Mostly Downstream) to Monitor Change and Manage or Avoid Its Effects\n\n| Who | Study 2 Method | Practice |\n|-----|----------------|----------|\n| | | Awareness and coordination |\n| D | S | Reactively track what upstream packages are doing (when it breaks; when you\u2019re notified somehow) |\n| D | S | Proactively track (maintain awareness via github notifications, mailing lists, etc.) |\n| D | S | Submit feature requests and bug reports to upstream package authors |\n| D | S | Participate in decision-making about upstream package\u2019s future |\n| D | S | Tool-based notifications about upstream changes (e.g., Greenkeeper) |\n| D | | Regularly test against unreleased development versions of dependency to give timely feedback |\n| P | | Socially connected group of developers following each other on Twitter, going to conferences, etc. |\n| P | | Political work among core people to get buy in on making a breaking change |\n| | | Protection against each potential change |\n| D | S | Do not update dependencies; just leave them at old versions known to work |\n| D | | Upgrade dependencies all at once only when making a new release |\n| D | S | Dependency hell: manual manipulation of dependency version constraints to get a set of dependencies to be mutually compatible |\n| U | S | Violate semantic versioning for trivial changes to prevent rippling updates that version change would require |\n| D | M | Lock file: fix versions of all upstream packages (incl. transitive dependencies) with release |\n| D | | Report wrong semantic versioning as a bug |\n| D | M,S | Specify an exact version number of a specific dependency |\n| D | M,S | Specify a range of legal version numbers of dependencies (e.g., allow minor but not major upgrades) |\n| D | M,S | Specify only a dependency\u2019s name and do not constrain what version of it is to be used |\n| | | Protection against dependencies themselves |\n| D | S | Do significant research about each dependency weighing whether to adopt it |\n| D | S | Wrap the dependency in an abstraction layer to decrease risk of change |\n| D | S | Avoid use of dependencies, roll your own |\n| D | S,M | Clone the dependency\u2019s code and maintain the new code yourself |\n| D | M,S | Copy dependency code into your own repository (\u201cvendoring\u201d) to get exact version needed |\n\nreleases of upstream packages. Gemnasium alerted developers of package releases that fix known vulnerabilities, whereas greenkeeper submitted pull requests to automate a continuous integration run against the new release. In either case, developers could react to notifications by email or pull requests.\n\nCRAN\u2019s requirement that upstream developers notify their downstream dependents when a change is coming appears to encourage downstream developers across the ecosystem to take a reactive stance (in contrast to Eclipse and Node.js/npm, where individual downstream developers need to employ optional monitoring tools). R7 defended the practice of waiting to be told about breaking changes as a principled attention-preserving choice, consistent with ecosystem norms; while R2 was apologetic about being reactive: \u201cI guess I\u2019ll sound crass about this and say it. For things\nlike that I would wait to hear from CRAN when something broke. Because I don\u2019t think I can keep up with all of it.\u201d CRAN enforces this policy with manual and automated checking on each package update, running the package\u2019s tests and the test of all downstream packages in the repository, as well as some static checks. The CRAN team may then warn an affected downstream developer of an upcoming change by email.\n\n4.3.2 Reducing the Exposure to Change. Many developers have developed strategies to reduce their exposure to change from upstream modules and, thus, reduce their monitoring and rework efforts. The degree to which developers adopt such mitigation strategies again depends on the technology, policies, and values, as we will discuss.\n\nLimiting dependencies. Most of the CRAN and Eclipse interviewees that we asked (11 interviewees: R1, R2, R3, R4, R6, R7, E1, E2, E4, E5, E9) felt that it was better to have fewer dependencies. Reasons for limiting dependencies included limiting one\u2019s exposure to upstream changes and not burdening one\u2019s users with a lot of modules to install and potential version conflicts (\u201cdependency hell\u201d). Interviewee E5 represents a common view: \u201cI only depend on things that are really worthwhile. Because basically everything that you depend on is going to give you pain every so often. And that\u2019s inevitable.\u201d Apart from removing no longer needed dependencies (tooling provided in Eclipse), six developers described more aggressive actions to avoid dependencies, including copying (R4) or recreating (R1, R6, R7, N6) the functionality of another package. N6 had to fork and recreate an upstream dependency as a temporary measure because of a licensing issue, but he did not feel dependencies were a burden generally.\n\nIn contrast, due to Node.js/npm\u2019s ability to use old versions and Eclipse\u2019s stability, three developers (E3, N1, N5) specifically said that they did not see dependencies as a burden.\n\nSelecting appropriate dependencies. When limiting themselves to appropriate dependencies, interviewees mentioned a variety of different signals they looked for; these fell into five categories:\n\n- **Trust of developers:** Seven interviewees (E4, R1, R5, R6, R7, N4, N6) mentioned basing decisions on personal trust of package maintainers. Criteria included being a large organization (E4), having a reputation for high quality code (R6, N6), and being consistent with maintenance (R6). One interviewee (R7) deliberately sent bug reports to a package to test whether the developer would be responsive before depending on it.\n\n- **Activity level:** Five interviewees (E4, N6, N2, R1, R6) considered the activity level of the community of developers; for example, distinguishing a \u201creal\u201d ongoing project from an abandoned research prototype. Both high and low activity levels can be a positive indicator depending on the state of the project, as stated by N2: \u201cOnes with activity are mostly better maintained; they have lots of people contributing, like express. It\u2019s likely the community will have eyes on the ball, consider backward compatibility, ramifications [\u2026] Ones with little activity are small projects that don\u2019t change often, so change isn\u2019t an issue either.\u201d\n\n- **Size and identity of user base:** Four developers mentioned the size of the user base was using signals such as daily download counts (E2, N3, N5), whether projects of trusted developers use it (N6), or, as E2 said, \u201cWhether I\u2019ll actually jump on it or not is about how I perceive other software projects are using it.\u201d N5 told us, \u201cWe look to see how many people are using it: number of downloads per day. If it\u2019s low, that\u2019s a clue that it\u2019s sketchy, but not a perfect heuristic.\u201d\n\n- **Project history:** Four interviewees said they assumed that past stable behavior of a package would predict future stability (R1, R4, R6, E2). Signals included their own experience with the package (N4, E5), its status as part of the platform\u2019s core set of packages (E4), or its\nvisible version history, such as lack of recent updates and a version number above 1.0 (E3, N1, N4).\n\n- **Project artifacts:** Finally, developers mentioned signals from project artifacts, including coding style (R1, R6), documentation (R1), good maintenance (N6), perceived ease of adoption (R1), code size (E2, N4, N7), and conflicts with other dependencies (N5).\n\n**Encapsulating change.** Interestingly, there was almost no mention of traditional encapsulation strategies to isolate the impact of changes to upstream modules, contrary to our expectations and typical software-engineering teaching [63, 73, 88]. Only N6 mentioned developing an abstraction layer between his package and an upstream dependency, implemented because of an anticipated change. Questions about encapsulation were not in our interview protocol, so we did not ask about it specifically, but one possible explanation is that since upstream package already generally try to avoid gratuitous API changes, the ones that are necessary would require changes to an encapsulating class\u2019s API, obviating the point of the encapsulation.\n\n### 4.3.3 Platform Values and Developer Values.\n\nBecause policies, tools, and practices support different values in each ecosystem, they impose different costs on developers depending on whether their attitude towards some particular dependency aligned or conflicted with the community\u2019s broader values. In some situations developers will treat a dependency as a fixed resource to draw functionality from (also termed API as contract [20]), but in other situations, they treat the interface as open to negotiation and change (also API as communication mechanism [20]).\n\nEclipse\u2019s value on backward compatibility and predictable release planning is convenient for developers and corporate stakeholders who wish to rely on the released core platform code as a fixed resource. Stability ensures that most developers relying on the platform packages do not need to monitor upstream changes, reacting at most to the yearly releases. Signals about whether to trust an upstream package are primarily social in the sense they can trust the packages that are part of the core, supported by corporations known to be invested in the stability of the platform.\n\nAccording to E6, developers working within more volatile parts of the Eclipse ecosystem, such as using code outside the stable core, or in-development features of the core, have a greater need for monitoring and may be exposed to more change, sometimes encountering friction associated with that. E6 told us that \u201cthere is a very different understanding of how important compatibility is and what it means, if you start from the platform, and then to the outer circles of Eclipse.\u201d E5 talked about recompiling upstream code often to report bugs to them within a week. Thus, although Eclipse deeply values stability, there is necessarily a sphere of activity with active collaboration and change where that value is appropriately set aside.\n\nCRAN\u2019s emphasis on consistency and timely access to research seems to encourage the API as communication rather than the API as contract [20] view of dependencies, in that its snapshot consistency approach forces maintainers to react to breaking upstream changes quickly (typically a few weeks [87]). This causes some apparent friction with researchers who might otherwise wish to publish their software and move on to other things. Many of the interviewees limited their dependencies, sometimes quite aggressively, by replicating code and reacting to notifications about change rather than actively following a community of upstream developers. However, an active and socially connected subset of developers (R7\u2013R9) seemed to welcome collaboration. Although R7 advocated reacting to upstream changes rather than trying to anticipate them, R7, R8, and R9 emphasized Twitter and conferences to maintain an upstream awareness.\n\nNode.js/npm\u2019s emphasis on convenience for developers has led to infrastructure that seems to decouple upstream and downstream developers from having to collaborate, since the downstream\ncan depend on old versions of the upstream for as long as they like. This should logically lead to less urgency to monitor upstream changes, except for patching security vulnerabilities. Developers do nonetheless often choose to take a collaborative approach to development, using tools such as continuous integration and greenkeeper [32] to force themselves to stay up to date despite the platform\u2019s permissiveness.\n\n**Summary of RQ1.2 results:** Downstream developers are motivated to update their dependencies to take advantage of bug fixes and new features and avoid technical debt. However, such updates can be complex or risky, can disrupt downstream users, and may require some awareness of ongoing activity in an upstream project. Strategies to balance the costs and risks include different levels of awareness of upstream projects (from social or technical participation, to active or merely reactive monitoring), chunking the work by making all updating decisions at once periodically, or limiting the problem by carefully vetting dependencies to begin with. As with upstream change decisions, the ecosystem\u2019s context affects participants\u2019 choices. Eclipse\u2019s extreme interface stability allows downstream developers, at least outside the core, to trust it and ignore the possibility of change. CRAN\u2019s policy of global consistency among packages creates pressure for package maintainers to actively collaborate with their upstream counterparts; a core community seems to be spurred to active collaboration on Twitter and at conferences, while a peripheral community limits dependencies to avoid this necessity. Finally, NPM\u2019s tooling decouples downstream developers from immediate impact by upstream changes; developers who nonetheless wish to stay up to date adopt tools like greenkeeper to remind and encourage them to update.\n\n### 4.4 RQ1.3 Unintended Consequences\n\nInterviewees told us about instances where policies or their combinations led to unintended consequences.\n\n**Eclipse.** One Eclipse developer said that the \u201cpolitical\u201d nature of making changes can drive away developers and users. \u201cYou have to be very patient and know who to talk with and whatnot; you really have to know how to play that game to get your patches accepted, and I think it\u2019s very intimidating for some new people to come on.\u201d He explained that with many interdependent packages managed by different people each with a mandate not to change their interfaces, implementing a rippling change can require negotiations among people with conflicting interests.\n\nAnother consequence of Eclipse\u2019s stability, along with its use of semantic versioning, is that many packages have not changed their major version number in over 10 years. However, as E8 told us, strict semantic versioning is impractical to follow, so even for the few cases of breaking changes that are clearly documented in the release notes, such as removing deprecated functions, major versions are often not increased. Updating a major version number can ripple version updates to downstream packages, which can entail significant work for the many downstream projects that have hard-coded major version numbers for their dependencies.\n\n**Node.js/NPM.** For Node.js/npm, in contrast, the rapid rate of changes and automatic integration of patches can raise concerns about reproducibility in commercial deployments. In many cases, the community then builds tools to work around some of the issues, such as providing tools that take a specific snapshot of an installation including all transitive package dependencies (e.g., \u201cnpm shrinkwrap\u201d or R/CRAN\u2019s packrat). \u201cIn npm, if you install today and tomorrow, you\u2019ll\nget 100s of dependencies, and something may have changed. So even if my version is the same, the servers could be running slightly different code, so customer facing code will differ and be hard to reproduce.\u201d\n\nR/CRAN. CRAN has a similar issue regarding scientific, rather than deployment reproducibility: The community\u2019s goal of timely access to current research conflicts with many researchers\u2019 goal to ensure reproducibility of their studies [61].\n\nIn R/CRAN, the opposite dynamic from Node is evident in its versioning policy: The official policy on version numbers only requires that version numbers increase with each submission\\(^{20}\\); but a permissive form of semantic versioning is used and recommended by many developers [87, 91].\n\nThese conflicts and unintended consequences suggest that the design of ecosystem practices is not a solved problem.\n\n**Summary of RQ1.3 results:** Unexpected community responses to policies included creative use of semantic versioning, innovative ways of promoting replicability, and stagnation.\n\n5 STUDY 2: A SURVEY ON VALUES AND PRACTICES: PREVALENCE, CONSENSUS, AND RELATIONSHIPS\n\nThe research questions for Study 2 emerged in large part from the results of our first study. Study 2 endeavored to expand the scope beyond these three cases and to ask further questions raised by our results.\n\nStudy 1 revealed substantial differences in our three cases in the practices used to manage breaking changes and in the values these practices appeared to serve. This raises the question of how prevalent such differences are. Some values may be nearly universal, and some practices may be so fundamental, well-known, and effective that they are employed by nearly all ecosystems. However, different ecosystems make use of different technologies, have evolved different cultures, and serve different constituencies, suggesting that at least some values and practices may vary, perhaps dramatically, among ecosystems. Our questions for Study 2 were therefore:\n\nRQ2.1: To what extent are values and practices for managing breaking changes shared among a diverse set of ecosystems?\n\nMoreover, we have been making the assumption that ecosystems tend to have a shared view of values and practices across the ecosystem, i.e., that they are characteristics of ecosystems rather than individual projects or sub-ecosystem clusters of projects. It seems important to test this assumption, hence:\n\nRQ2.2: To what extent do individual ecosystems exhibit consensus within the community about values and practices?\n\nFinally, as we observed in Study 1, it seems that some practices are designed to serve the ecosystem\u2019s values, e.g., to insulate an installed base of applications from changes (Eclipse), to make it easy for end-users to install and use the latest software (R/CRAN) or to allow developers to contribute code as simply as possible (Node.js). Are particular values always associated with specific practices that further that value? We ask more generally:\n\nRQ2.3: What is the relationship between ecosystem values and practices?\n\nAnonymized survey data is available [7].\n\n\\(^{20}\\)https://cran.r-project.org/web/packages/policies.html\n5.1 Study 2 Results: Validation of Study 1\n\nBefore presenting new results from the survey, we take the opportunity to validate some of the results of Study 1, since we have available hundreds of survey responses covering similar questions from the three ecosystems in that study.\n\nStudy 1 characterized practices and values of three ecosystems based on interviews with developers in each ecosystem. The values they inferred for Eclipse and Node.js/NPM align with our data: Eclipse participants did seem to value backward compatibility as postulated: Stability and compatibility were their two highest ranked values (Table 10). Aligning with findings from the interviews, Eclipse developers were top-ranked in claiming to make design compromises in the name of backward compatibility (Figure 3(c)). Aligning with the interview result that showed Node.js developers to value ease of contributions for developers, Node.js participants in our survey were top ranked in valuing innovation and ranked highly in both making frequent changes to their own package (Figure 3(a)) and in facing breaking changes from dependencies (Figure 4(a)), although they were mid-rank in feeling any less constrained from making changes than other ecosystems (Figure 3(b)).\n\nCRAN survey participants did not highly rank rapid access as expected from the interviews; and they were not more averse to adopting dependencies as predicted (not shown), although, as predicted, they did claim to clone code more (not shown). Aligning with interview results discussing personal contacts among upstream and downstream developers, they were top ranked in reporting being personally warned about changes in their dependencies (Figure 4(e)), but, contrary to expectations, were low ranked in warning their own downstream users (Figure 3(h)). This contrast, in particular, i.e., frequently being warned but rarely issuing warnings, suggests that our R/CRAN interviews may be overweighted toward downstream developers.\n\nAlthough the survey largely validates the interview results, the differences highlight the fact that different methods with different sampling strategies can produce somewhat different results, and that even the design intentions of core members responsible for promulgating practices are not necessarily propagated to the whole community.\n\n5.2 Study 2 Results: To What Extent Are Values and Practices Shared across Ecosystems? (RQ2.1)\n\nThe survey, policy analysis, and data mining revealed an interesting pattern of similarity and differences in values and practices across ecosystems. For those that vary across ecosystems, it is rare that we see a clear division of ecosystems in two distinct groups. Rather, sorting tends to generate a smooth curve between the extremes. Visible differences between ecosystems at either end of the spectrum are generally statistically significant, and often a few ecosystems stand out, as we will discuss. We plot answers to many of our survey questions in Figures 2, 3, and 4 and Table 7.\n\nAll values, except for commerce (Figure 2), were considered at least \u201csomewhat important\u201d in all ecosystems. Stability, quality, and community are nearly universal values and compatibility, rapid access, and replicability are also rated highly across most ecosystems (see the bottom rows of Figure 2 for the few exceptions). For quality in particular, participants felt even more strongly, and more consistently, that it was of high importance to them personally and to the ecosystem as a whole (the mean personal value of quality was about 0.8 scale points higher than the mean ecosystem value). Still, we see strong differences between ecosystems at each end of the spectrum. Personal values correlate strongly with perceived community values (Spearman $\\rho = 0.416, p < .00001, n = 10878$, comparing the two answers for each of the eleven values, for each person, as a separate observation), but participants, on average, rated quality as a much higher personally,\nTable 7. Comparison of Data-mined Practices (Data from libraries.io and World of Code [57]; see Section 3.3.6 for Details\n\n| Ecosystem | (a) Exact | (b) min only | (c) range | (d) unconstrained | (e) Cloning | (f) Lock Files | (g) Maint. old vers. |\n|--------------------|-----------|--------------|-----------|-------------------|-------------|----------------|---------------------|\n| Atom (plugins) | 22.5% | 1.55% | 73.7% | 1.29% | 2.62% | 0.1% | 1.8% |\n| CocoaPods | \u2013 | \u2013 | \u2013 | \u2013 | \u2013 | 8.37% | 3.85% |\n| Eclipse (plugins) | \u2013 | \u2013 | \u2013 | \u2013 | \u2013 | n/a | \u2013 |\n| Erlang,Elixir/Hex | 9.09% | 9.25% | 81.6% | 0.0% | \u2013 | 65.7% | 3.95% |\n| Go | \u2013 | \u2013 | \u2013 | \u2013 | 3.24% | 14.4% v | \u2013 |\n| Haskell (Cabal/Hackage) | \u2013 | \u2013 | \u2013 | \u2013 | \u2013 | 0.5% | 1.04% |\n| Haskell (Stack/Stackage) | \u2013 | \u2013 | \u2013 | \u2013 | \u2013 | 0% | n/a |\n| Lua/Luarocks | \u2013 | \u2013 | \u2013 | \u2013 | 3.21% | 0% | \u2013 |\n| Maven | 100.0% | 0% | 0% | 0% | 0.72% (Java)| n/a | 25.4% |\n| Node.js/NPM | 16.3% | 0.44% | 78.6% | 3.67% | 7.03% | 0.8% | 3.96% |\n| NuGet | 5.27% | 88.7% | 6.01% | 0% | \u2013 | 7.2% | 17.6% |\n| Perl/CPAN | 100.0% | 0.0% | 0.0% | 0.0% | 2.30% | 1.0% | 2.72% |\n| PHP/Packagist | 21.3% | 3.72% | 66.7% | 7.99% | 1.16% | 16.9% | 10.6% |\n| Python/PyPi | 14.6% | 34.5% | 5.86% | 44.1% | 8.17% | n/a | 6.07% |\n| R/Bioconductor | \u2013 | \u2013 | \u2013 | \u2013 | 3.59% | 0.2% | n/a |\n| R/CRAN | 0.0% | 24.4% | 0.0% | 75.6% | 2.69% | 0.8% | 0.10% |\n| Ruby/Rubygems | 3.78% | 49.6% | 46.3% | 0.94% | 1.76% | 17.4% | 4.54% |\n| Rust/Cargo | 3.86% | 2.14% | 93.6% | .40% | 6.90% | 14.6% | 1.4% |\n\nDependency Version Constraints: Over all versions of packages in our data, over each of the packages\u2019 dependencies, what proportion of dependencies were constrained with Exact version number, specified the minimum version only, a range of versions, or left the version unconstrained. Dash(\u2013) means no data (dependencies not tracked in libraries.io, or language files not indexed in WoC). Most common type of constraint for each ecosystem is bolded.\n\nCloning is percent of packages in repository whose projects borrowed a file from another package. Maint. old vers. is percent of packages whose version number does not increase monotonically. Lock files is percentage of packages that use a lock file to set an exact version of transitive dependencies. n/a = no equivalent of a lock file. v = Go includes projects with a \u201cvendor\u201d directory, which has a similar effect as a lock file.\n\ncompared to how they rated it as an ecosystem value (.9 Likert scale points, paired t-test: p<.0001); they also tended to rate fun slightly higher personally (.6 Likert scale, paired t-test: p<.0001); all other differences were within half a Likert scale point.\n\nAdditional values from open-ended questions. We also asked an open-ended question about other values important to their ecosystem. Common themes are counted in Table 8. Answers included usability (15 responses) and social benevolence (good conduct, altruism, empowerment, making resources available to all; 17 responses). An interesting pair of contrasting values we had not considered was standardization (12 responses) and technical diversity (17 responses). Technical diversity advocates valued freedom to implement things and interact with other developers in a diversity of ways: \u201cthe package creator should be in charge of deciding how best to manage his/her package and organize with other contributors [\u2026]\u201d (Node.js/NPM respondent), while standardization advocates said their ecosystem limited choice to save developers time and effort by promoting wide adherence to standards: e.g., a Python respondent said the platform\u2019s \u201copen ecosystem proposes commonly used, sensible ways to solve popular problems, enforces de facto standards\u201d and decried the chaos of \u201cNIH [Not Invented Here] syndrome.\u201d\nTable 8. Number of Respondents Suggesting Other Ecosystem Values: Usability, Social Benevolence, Standardization, Technical Diversity, Documentation, Modularity, Testability\n\n| Ecosystem | Usability | Social Benevolence | Standardization | Technical Diversity | Documentation | Modularity | Testability |\n|----------------------------|-----------|--------------------|-----------------|---------------------|---------------|------------|-------------|\n| Atom (plugins) | | | 1 | | | | |\n| CocoaPods | 2 | 2 | | | | | |\n| Eclipse (plugins) | | | | | | | |\n| Erlang,Elixir/Hex | 1 | 1 | 1 | | | | |\n| Go | 1 | 4 | 4 | 2 | 1 | 1 | |\n| Haskell (Cabal/Hackage) | | | | | | | |\n| Haskell (Stack/Stackage) | | | | | | | |\n| Lua/Luarocks | | | 1 | | | | |\n| Maven | | | | | | | |\n| Node.js/NPM | 1 | 1 | | | 3 | 7 | |\n| NuGet | | | | | | | |\n| PHP/Packagist | | | | | | | |\n| Perl/CPAN | 2 | 2 | 3 | 5 | 2 | 1 | 5 |\n| Python/PyPi | 1 | 2 | 1 | 2 | 2 | | |\n| R/Bioconductor | | | | | 4 | | |\n| R/CRAN | | | | | | | |\n| Ruby/Rubygems | 3 | 3 | 2 | 2 | 4 | | |\n| Rust/Cargo | 1 | 1 | | | 1 | 1 | |\n| other | 1 | 1 | 1 | 1 | 1 | | |\n\nOther responses to this question we deemed to be not really ecosystem values, but rather favored technical qualities of code at the package level (64 responses), which might be promoted by ecosystem culture, such as good documentation (11 responses; 4 of which were Bioconductor participants); high modularity (16 responses; 7 of them in Node.js/NPM); and testability (11 responses; 4 each in Ruby and Perl). Finally, 13 (8%) responses objected to the framing of the question, claiming either that no community existed that could be said to share values (5 respondents, 3 of them in Maven) or saying that multiple subcommunities existed with differing values (8 respondents, including 2 in Erlang/Hex and 2 in Haskell/Cabal).\n\nOther recent surveys [34, 77] have used similar sets of values. In light of responses to our survey, we propose the revised list of values in Appendix C. This new list adds the new values of Standardization, Technical Diversity, Usability, and Social Benevolence, removes Quality (since it did not distinguish among ecosystems).\n\nChange planning practices. Participants across all ecosystems indicated in the survey (Figure 3) that they perform breaking changes only rarely: a median of less than once a year both for the changes that our participants perform (Figure 3(a)) and breaking changes that their package faces from dependencies (Figure 4(a)). Although prior research suggests that breaking changes are \u201cfrequent\u201d (Section 2), this is relative to the overall frequency of change. Applying a back-of-envelope estimate to Decan et al. [21]\u2019s findings, for example: They report about 5% of updates actually caused breakages, against a background rate of about 1.2 updates per year per package (1,029 updates to 1,710 packages in a six-month window), or one breakage every 17 years. Given that breakages may not be evenly distributed, packages have multiple, recursive dependencies, and developers work on multiple packages, experiencing a breakage once a year is in the range of\nTable 9. Comparison of Sanctioned Practices and Features\n\n| Ecosystem | (a) Dependencies outside repository | (b) Central Repository | (c) Access to old dependency versions | (e) Gatekeeping standards | (f) Synced ecosystem |\n|----------------------------|-------------------------------------|------------------------|--------------------------------------|--------------------------|----------------------|\n| Atom (plugins) | \u25cf | \u25cf | \u25cf | \u25cf | \u25cf |\n| CocoaPods | \u25cf | \u25cf | \u25cf | \u25cf | \u25cf |\n| Eclipse (plugins) | \u25cf | \u25cf | \u25cf | \u25cf | \u25cf |\n| Erlang,Elixir/Hex | \u25cf | \u25cf | \u25cf | \u25cf | \u25cf |\n| Go | \u25cf | \u25cf | \u25cf | \u25cf | \u25cf |\n| Haskell (Cabal/Hackage) | \u25cf alt repo | \u25cf | \u25cf | \u25cf | \u25cf |\n| Haskell (Stack/Stackage) | \u25cf | \u25cf | \u25cf | \u25cf | \u25cf |\n| Lua/Luarocks | \u25cf | \u25cf | \u25cf | \u25cf | \u25cf |\n| Maven | \u25cf | \u25cf | \u25cf | \u25cf | \u25cf |\n| Node.js/NPM | \u25cf | \u25cf | \u25cf | \u25cf | \u25cf |\n| NuGet | \u25cf alt repo | \u25cf | \u25cf | \u25cf | \u25cf |\n| Perl/CPAN | \u25cf alt repo | \u25cf | \u25cf | \u25cf | \u25cf |\n| PHP/Packagist | \u25cf | \u25cf | \u25cf | \u25cf | \u25cf |\n| Python/PyPi | \u25cf | \u25cf | \u25cf | \u25cf | \u25cf |\n| R/Bioconductor | \u25cf alt repo | \u25cf | \u25cf | \u25cf | \u25cf |\n| R/CRAN | \u25cf alt repo | \u25cf | \u25cf | \u25cf | \u25cf |\n| Ruby/Rubygems | \u25cf | \u25cf | \u25cf | \u25cf | \u25cf |\n| Rust/Cargo | \u25cf | \u25cf | \u25cf | \u25cf | \u25cf |\n\n\u25cf = ecosystem has feature, \u25cb = does not have feature, \u25a1 = has feature, but for a group of packages, not for individual packages. alt repo = through reference to an alternative repository; staged releases = groups of packages are debugged together and released as a group. submitter = the author, not the package, is vetted. core = core packages only. See Section 3.3.5 for details.\n\nplausability. So, this is perhaps why their actual experience of dealing with a breaking change may be infrequent even if breaking changes are frequent overall in the ecosystem.\n\nRespondents in every ecosystem agreed, on average, that they used semantic versioning or comparable versioning strategies (Figure 3(f)), batch multiple changes into a single release (Figure 3(d)), document their changes (Figure 3(e)), and are conservative about adding dependencies to their projects (Figure 4(c)). These seem to generally be considered as good software-engineering practices independent of programming language or ecosystem.\n\nAnswers that varied more dramatically among ecosystems included reluctance to make breaking changes (Figure 3(b)), willingness to compromise design for backward compatibility (Figure 3(c)), and synchronizing with users before releasing changes (Figure 3(h)). Data mining reveals that ecosystems also vary considerably in how often they make updates to previous versions, ranging from as high as 25% of Maven projects doing this at least once, to 0.1% of R/CRAN projects doing so.\n\nTurning to shared community resources, all but two of the ecosystems we studied supply a central repository server from which packages could be downloaded automatically as needed (Table 9(b)). Two (Go and Eclipse) only maintain indexes to maintainers\u2019 own servers that must supply the package and metadata in some standard way. Advertised submission requirements for packages show that ecosystems differed in the level of vetting (Table 9(e)) of the packages.\nthese repositories apply. Haskell\u2019s Cabal/Hackage system is unusual in that it vets maintainers, who apply for accounts that are hand-checked by human reviewers, but does not apply more than minimal automated standards to submitted packages. CRAN has very strict standards for package submissions and updates, which are vetted by hand as well as automated tests.\n\nThree ecosystems are released all at once on a regular, synchronized schedule (Table 9(f)): the core set of packages in Eclipse, as well as the whole of Bioconductor (synchronized with releases of the R runtime), and CPAN. These work by having a staged sequence where a development build is worked on until it is consistent, then parts or all of it are released as a group into the official supported release. Other ecosystems allow developers to release packages whenever their authors wish. This is similar to practices of operating-system-level software ecosystems such as Debian\u2019s APT that repackage software from a variety of languages and ecosystems into compatible releases for an operating system.\n\nNote that Stackage\u2019s sets of compatible packages are curated together post hoc; their development is not synchronized unless developers collaborate on their own to do so.\n\nPractices for coping with dependency changes. Sixteen of the 18 ecosystems offer an optional (Table 9(b)) but widely used central repository (Table 9(a)) for packages, usually encouraging packages to refer to dependencies by name and version number.\n\nWhen asked specifically about their package\u2019s exposure to breaking changes from upstream packages, participants across all ecosystems again reported low frequencies (Figure 4(a)); only a quarter of our participants indicated that they saw a breaking change per year. Participants in ecosystems with more conservative change practices (e.g., Eclipse, Erlang, Perl) are exposed to slightly fewer breaking changes. Participants across all ecosystems indicated that they are conservative in adding dependencies (Figure 4(c)) and perform significant research first (Figure 4(d)). In contrast, how they learn about updates (Figures 4(e)\u2013(g); e.g., through personal contacts or tools), the rate to which they may skip them (Figure 4(h)), and how they declare version constraints on dependencies (Figure 4(i)) depends significantly on the ecosystem.\n\nData mining (Table 7) reveals that file cloning is rare (less than 10% of projects) in every ecosystem in which we measured it; developers instead rely on the package dependency infrastructure (Table 7(e)). Mining also confirmed survey answers about how users of packages chose to constrain the versions of packages they depended on: While Maven almost universally relies on a fixed version number (e.g., package A might depend on precisely version 3.2.1 of package B), other ecosystems typically constrain dependencies to version number ranges (Node.js/NPM, Atom, PHP, and Rust/Cargo), specifying only a minimum version (NuGet, Ruby/RubyGems) or leaving versions unconstrained (Python/PyPi, R/CRAN). Survey and mining results differed for one ecosystem, however: Perl/CPAN users claimed the ecosystem\u2019s typical practice was to specify just the name (43% of respondents) or version range (36%) of dependencies, yet mining of libraries.io revealed nearly 100% use of exact version numbers. This may be a matter of developer perception: libraries.io apparently measures precise dependencies captured in the published repository, but tools such as Dist::Zilla::Plugin::DistINI generate these from less-constrained numbers specified by developers.\n\nUniversal or distinctive. While there is considerable nuance in the differences among ecosystems, overall our results suggest that there are several values that seem to be universal, at least\n\n---\n\n21 https://cran.r-project.org/web/packages/policies.html.\n22 https://wiki.debian.org/Apt.\n23 https://github.com/commercialhaskell/stackage#frequently asked-questions.\nin the 18 ecosystems we surveyed. Chief among these are stability, quality, and community, while compatibility, rapid access, and replicability have achieved a near-universal status. The unique personality of each ecosystem, however, seems to derive from either a few key distinctions (in values or in practices) that set them apart. There are many examples of this, including:\n\n- **Bioconductor** and **Eclipse** stand out as coordinating releases on a synchronized and fixed schedule and the survey (Figures 3(i) and (j), Table 9(f)) and valuing **curation** (Figure 2, Table 9(e)).\n- **Go** has a distinctive version numbering practice that does not require version updates on all changes (Figure 3(g), Table 9(c)).\n- **CRAN** and **Bioconductor** have strict requirements for submission and update of packages (Figure 3(k), Table 9(e)).\n- **Lua** developers value **fun**, feel least constrained from making changes in their code, and generally do not coordinate much with others (Figures 3(b),(h), and (i)).\n- **Rust** has a strong stance on **openness** and is the least prone to make design compromises for backward compatibility (Figure 3(b) and (c)). Data mining of Cargo projects show they rarely port fixes to earlier code releases (Table 7(g)).\n- **CPAN** developers universally claim to write change logs (Figure 3(e)).\n\nValue differences by ecosystem are statistically significant for each of the values (Kruskal-Wallis, run separately on each value to check if it differs by ecosystem: \\( p < 0.00001 \\), \\( \\chi^2 \\) ranging from 53.704 for **quality** to 178.69 for **commerce**).\n\n**Summary of RQ2.1 results:** Stability, quality, community, compatibility, rapid access, and replicability are important across all ecosystems, while openness, curation, standardization, technical diversity are values that are not universal, but differ by ecosystem. Breaking changes are experienced only rarely by any one developer (on the order of yearly), even though they are common within an ecosystem as a whole. Differing ecosystem circumstances lead to great variety in developers\u2019 willingness to make breaking changes, or conversely to compromise their designs to ensure backward compatibility; and in turn consumers\u2019 eagerness to incorporate upstream changes.\n\n### 5.3 Study 2 Results: To What Extent Is There a Consensus within Ecosystems about Values and Practices? (RQ2.2)\n\nThe distribution of value ratings **within** each ecosystem was particularly wide for the values **repli-cability**, **openness**, and **curation**, indicating generally less consensus on these values. There is evidence of broad consensus about the highest ranked value(s) for some ecosystems (Table 10), most conspicuously in cases in which a value clearly aligns with the core purpose of an ecosystem. An illustrative example is Stackage and Cabal/Hackage, two Haskell-based ecosystems, contrasted strongly with each other in **compatibility** and **curation**; participants rated these values as much more important in Stackage than in Hackage/Cabal. Stackage was also rated markedly lower in **rapid access** than all other ecosystems. These values are consistent with the stated goals of Stackage (\u201cto create stable builds of complete package sets\u201d). Stackage is built on top of Cabal for the express purpose of curating compatible sets of versions, while Hackage submissions only require that they be submitted by a developer whose identity has been manually vetted (Table 9(e)). Volunteer curators wait until a set of consistent package versions can be assembled and release them as\nTable 10. Values Most Commonly Rated Highest, by Ecosystem\n\n| Ecosystem | Top 3 values | Consensus in % |\n|-----------------|-------------------------------|----------------|\n| Haskell/Stack | compatibility > replicability > curation | 75 55 45 |\n| Perl/CPAN | stability > replicability > quality | 64 40 31 |\n| Maven | replicability > stability > quality | 64 38 32 |\n| Lua/Luarocks | fun > replicability > quality | 64 35 17 |\n| Eclipse | stability > compatibility > quality | 62 48 37 |\n| NuGet | replicability > compatibility > stability | 59 37 20 |\n| Go | quality > stability > fun | 56 37 19 |\n| R/Bioconductor | replicability > quality > compatibility | 52 32 26 |\n| CocoaPods | quality > stability > compatibility | 52 30 17 |\n| Rust/Cargo | replicability > stability > community | 51 31 23 |\n| PHP/Packagist | quality > stability > compatibility | 50 32 23 |\n| Node/NPM | rapid.access > community > innovation | 50 24 15 |\n| Atom | rapid.access > fun > openness | 50 26 17 |\n| Erlang | quality > fun > stability | 46 24 18 |\n| Haskell/Cabal | quality > innovation > replicability | 43 17 8 |\n| Python | replicability > quality > stability | 42 20 14 |\n| Ruby | fun = community = rapid.access | 41 18 12 |\n| R/CRAN | replicability > compatibility > innovation | 36 20 8 |\n\nConsensus Cn is the percent of respondents in each ecosystem who did not rate any value higher than any of the ecosystem\u2019s highest n values. Top three values are listed for each ecosystem; > indicates relative popularity of the values; = indicates ties.\n\nA unit, trading rapid release for tested compatibility. The Stackage/Hackage choice is controversial in the Haskell community, which may make their perceived differences in values and practices more visible.\n\nA few more examples include:\n\n- **Maven** is primarily a build tool that comes with a centralized hosting platform for Java packages and was not designed as a collaborative platform. This purpose is reflected in strongly valuing **replicability** but least valuing **community**, **openness**, or **fun**.\n\n- **Bioconductor** is a platform for scientific computation (specifically, analysis of genomic data in molecular biology) where **replicability** of research results is a key asset, but **commerce** is clearly not a focus.\n\n- **Lua** is widely used as an embedded scripting language for games; prior work has shown that the culture of game developers is significantly different from that of application developers [58]; for example, game development communities value creativity and communication with designers over rigid specifications, which makes extensive automated testing impractical.\n\nOthers, like **R/CRAN**, have markedly less consensus, at least regarding the set of values that we surveyed.\n\nSome, but not all, practice differences can be explained by enforced policies or design choices in platform tools. For example, **Node.js/npm** sets a version range for dependencies by default when a dependency is added (Figure 4(i)), **Bioconductor** and the core packages of **Eclipse** have a\nsynchronized, central release (Figure 3(i) and (j), Table 9(f)), and Bioconductor and CRAN require reviews before packages are included in the repository (Figure 3(k), Table 9(e)). Some practices are supported by optional tooling in the ecosystem, such as tools to create notifications on dependency updates in the Node.js and Ruby community (Figure 4(i); e.g., gemnasium and greenkeeper.io). Other practices seem to be mere community conventions\u2014for example, providing change logs is encouraged in the documentation of CPAN but not enforced, yet the practice is apparently universal (Figure 3(e)).\n\nInterestingly, there are some cases of practices with surprisingly little consensus in some ecosystems given what we know about tools and policies in that ecosystem. For example, 26.6% of Node.js respondents indicated that a \u201cpackage has to meet strict standards to be accepted into the repository\u201d (Figure 3(k)), even though that community\u2019s npm repository does not have any such checks (Table 9(e)) and in fact contains many junk packages. It may be that ecosystem members are not aware of the design space and what practices other ecosystems employ, so they have a biased interpretation of what a \u201cstrict standard\u201d is. Alternatively, participants may be members in subcommunities with contrasting values and practices. For example, there may be vetting of revisions among the developers within a specific project or subcommunity that is also hosted on npm.\n\nThe role of roles. We wanted to explore the possibility that survey respondents\u2019 differences in perceived values and practices may be explained by the role of a respondent in their ecosystem. The ecosystem may appear different depending on one\u2019s responsibilities and perspective. The survey asked people what their role was in the ecosystem: choices were user, committer, submitter, package lead, central package lead (a.k.a. lead+), and founder. We analyzed how core (lead+ and founder) roles differed from the rest within each ecosystem. We suspected that core and peripheral ecosystem participants may have different values, but we found little evidence that that was the case. We tested their ratings on the perceptions of all 11 values and found that only for one value, replicability, was there a statistically significant difference (t-test, p = 0.044, n = 1,504); however, this difference was small (an average rating 3.5 out of 5 for core, 3.68 for non-core, thus a difference of 0.18 scale points), and there was no evidence that value perceptions differed for other values (t-test, p between .13 and .73, n ranging from 1,492 to 1,504).\n\nCore people seemed to be more enmeshed in the community than the other roles, in the sense that they were more likely to collaborate with upstream packages ($\\chi^2(1, N=932) = 16.571, p < .0001$; 21% more likely to answer yes to the question, \u201cIn the last 6 months I have participated in discussions, or made bug/feature requests, or worked on development of another package in <ecosystem> that one of my packages depends on.\u201d) have downstream dependencies ($\\chi^2(1, N=925) = 24.132 p < .0001$, 18% more likely to answer yes to the question, \u201cHave you contributed code to an upstream dependency of one of your packages in the last 6 months (one where you\u2019re not the primary developer)?\u201d), and claim to know their users\u2019 needs ($\\chi^2(1, N=932) = 62.947 p < .0001$, 29% more likely to answer \u201cStrongly\u201d or \u201cSomewhat agree\u201d to the question, \u201cI know what changes users of <package> want\u201d). People in core roles felt very slightly more confident in their answers to the community values questions, ($\\chi^2(1,N=932) = 6.2247 p < .05$, 8% more likely to answer \u201cConfident\u201d or \u201cVery confident\u201d to the question \u201cHow confident are you in your ratings of the values of <ecosystem> above?\u201d); this difference was statistically significant, but not very large.\n\nIn short, there are a few features that distinguish core community members from the rest, but they seem to be culturally a part of their communities in that they perceive its values to be the same.\nSummary of RQ2.2 results: Ecosystems tend to have many of the same values but distinguish themselves by virtue of a few distinctive values strongly related to their purpose and audience. Consensus in practices is largely, but not entirely, driven by the affordances of shared tooling and the policies that they enforce or encourage. Core and peripheral members of the ecosystem community share their ecosystem\u2019s values, but core members are more collaborative in their practices.\n\n5.4 Study 2 Results: What Is the Relationship between Values and Practices: The Case of Stability (RQ2.3)\n\nOne might expect that ecosystems that share similar values would adopt similar practices that support those values, but for most practices that is not the case. We averaged each value and practice answer within each ecosystem to get a summary for each ecosystem of mean answers and looked for correlations between any value and any practice among columns within these 18 rows. There were few strong correlations between values and practices. Out of 418 such value-practice comparisons, only 29 were significantly correlated (Spearman test, \\( p < 0.05 \\)); however, even these may be due to chance: Because of the small sample size (\\( n = 18 \\)) and the large number of comparisons, applying a Holm-Bonferroni correction rules out taking any of these correlations as conclusive.\n\nThe fact that practices are not universally associated with particular values implies that the same value can be associated with the adoption of different practices. For example, of the practices shown in the violin plots above, only one, the perception of the ecosystem\u2019s use of exact version numbers to refer to dependencies (Figure 4(i), choice E), significantly correlated with the perceived value of stability to the ecosystem (Spearman correlation of mean answers within each ecosystem: \\( \\rho = 0.506, p < .05, n = 18 \\) ecosystems). We investigate further this relationship with a comparison of the practices associated with stability in three ecosystems that had high ratings and high consensus for stability: Eclipse, Perl, and Rust (Figure 2 and Table 10). Our survey results indicate that these ecosystems achieved stability with different, sometimes nearly opposite, practices.\n\n- **Eclipse: stability through strict standards and gatekeeping.** Eclipse\u2019s leadership very strongly promotes stable plugin APIs. As we mentioned earlier, official developer documentation includes this \u201cprime directive\u201d: \u201cWhen evolving the Component API from release to release, do not break existing Clients\u201d [25]. Eclipse developers rated stability higher than any other ecosystem, and with the smallest variance in their mean ratings of stability (Figure 2), and strong consensus that stability was the highest value (cf. Table 10).\n\n Survey answers about practices show that Eclipse relies on gatekeeping (Figure 3(k)) and its developers claim to make design compromises to achieve backward compatibility (Figure 3(c)); they police each others\u2019 backward compatibility and release together when they can be sure they will not break legacy code (Figure 3(i)); developers feel constrained in making changes (Figure 3(b)).\n\n- **Rust: stability through dependency versioning and stability attributes.** Rust, in contrast, ranked lowest in design compromises for backward compatibility (Figure 3(c)) and rarely maintains outdated versions, (Table 7(g)), but is high in semantic versioning (Figure 3(f)). Rust\u2019s Cargo infrastructure prevents the use of wildcards for dependency versions, although it allows ranges (Figure 4(i)), which are almost universally used (93.6% of Cargo packages, Table 7(c)).\n\n---\n\n24Figures 3(b)\u2013(k) and Figures 4(c)\u2013(h) and (j), and the four answers of Figure 4(i) taken separately.\nUsers were thus prodded to use older versions of dependencies, rather than letting their tools upgrade them automatically and burdening upstream packages with bug reports when things change. Other stability features include a \u201clock\u201d file that records exact versions of dependencies used by a version (Table 7(f)), and a feature called \u201cstability attributes,\u201d which tag API elements that are guaranteed to be stable, in contrast to new features that might change [80].\n\nSurvey results show that Rust developers acknowledged the community\u2019s stated value of stability (Figure 2), despite the fact that participants also perceived the ecosystem\u2019s packages to be in fact relatively unstable (Figure 4(b)). The Rust language developers had been consistent in promising stability for the \u201cstable\u201d branch of the language, to the extent that they test any compiler changes against the entire corpus of Rust programs they can find on GitHub. But their analysis of their community\u2019s 2016 user survey [79] summarized why many users complained about instability: too many packages (\u201ccrates\u201d) relied on unstable \u201cnightly\u201d development versions of the compiler to take advantage of interesting new features. They concluded that \u201cconsensus formed around the need to move the ecosystem onto the stable language and away from requiring the nightly builds of the compiler.\u201d\n\n- CPAN: stability through centralized testing. Finally, Perl, unlike Rust, is low in semantic versioning (Figure 3(f)), and in fact was the most likely ecosystem to claim they refer to dependencies by name only, not version number (Figure 4(i)). They indicate some gatekeeping and design compromises but not to the extent of Eclipse (Figure 3(c) and (k)). However, in response to the open-ended question about what other values were not covered by the survey, 12 (40%) of 30 Perl/CPAN participants who gave comments mentioned testability, many referring to Perl\u2019s extensive battery of tests run on CPAN packages by volunteers; one explicitly claimed this test facility helped with the stability of Perl packages. CPAN stages changes and releases packages together (Table 9(f)), almost entirely specifying fixed version numbers of their dependencies (Table 7(a)). A Haskell/Hackage participant mentioned CPAN\u2019s kwalitee metric, an operationalization of quality employed by these testing facilities, and attributed it to the ecosystem\u2019s \u201cfocus on stability and compatibility.\u201d\n\nThe three ecosystems work towards stability in very different ways. Eclipse, with its long-standing corporate support, is able to dictate that upstream developers pay the cost of maintaining backward compatibility; Rust/Cargo, although users clamor for stability, is eager to attract developers, and cannot impose the cost of stability by fiat as in Eclipse; instead, they apply gentle pressure to upstream developers in various ways, while easing the pressure from downstream developers by discouraging automatic major updates. CPAN, finally, has a large cadre of volunteers (CPAN Testers) and built infrastructure taking on the task of thorough testing.\n\nThis comparison of stability practices demonstrates that the relationships between practices and values are context-dependent and thus hard to generalize. A comprehensive theory incorporating such insights is a task for future work. We hope our dataset and the questions it suggests provide a useful launching point. Contrasts revealed by the survey are ripe for further investigation: researchers can find appropriate subjects for case studies of values being pursued in contrasting ways, or, conversely, practices associated with contrasting values. In this case, analyzing the differences between these three ecosystems suggests that the theory of how practices can further values should take into account other factors, including the presence, availability, and motivations of different kinds of developers. This should be confirmed, however, with more exhaustive study of\n\n---\n\n25 Testability was not a value we surveyed, but we recommend it as a new value in an expanded list, since many survey takers suggested it.\nthese and other ecosystems and with other practice contrasts. Ecosystem communities dissatisfied with their practices can themselves use it as a starting place to find alternative combinations of practices that others are using.\n\n**Summary of RQ2.3 results:** Many ecosystems have clear distinctions in a few key values and practices. Often the consensus on important values is high; some practices are actually enforced by policies and platform tools. However, some values, particularly *quality*, are nearly universal value for software engineers with little variance among ecosystems. Breaking changes are also generally avoided, though the strategies of how this is achieved and as how difficult it is perceived to be depends on the specifics of the ecosystem.\n\n### 6 DISCUSSION AND FUTURE WORK\n\nOur article makes several contributions toward understanding how ecosystems go about the critical task of managing breaking changes and how those practices reflect the culture and values of the ecosystem participants. Study 1 contributes a qualitative accounting of the very different ways that three contrasting ecosystems manage change and how these differences relate to different values and different ideas about which classes of participants should bear the costs. Prior work [19, 36, 67, 72] has examined particular practices for change management and noted the prevalence of breaking changes [22, 48, 54, 90]. Our contribution is to characterize the types of change negotiation practices found in three different ecosystems, show how these different sets of practices require varying amounts of effort from different classes of ecosystem participants. We also show how these different sets of practices reflect ecosystem values about the software, the community, and which community needs take precedence. Study 2 builds on this, examining practices and values in a larger set of 18 ecosystems. We find that some values appear to be universal or nearly so, within this set of ecosystems, perhaps reflecting a broader open source culture. Other values show considerable divergence, which appears to be a substantial component of ecosystems\u2019 distinctive \u201cpersonalities.\u201d Within ecosystems, some values appear to reflect a consensus among participants, while views of others are highly variable, perhaps reflecting diverse views of subsets of projects or individuals, rather than ecosystem-wide values. We also show that the relationship between practices and values is not simple, and we illustrate the apparent nature of such relationships by contrasting the very different practices that several ecosystems employ in pursuit of stability, which all of them value highly.\n\nIn the following subsections, we outline new and interesting research questions brought to light by this work.\n\n#### 6.1 When Are Practices in Conflict or Complementary?\n\nIt seems highly unlikely that practices can be treated as independent of one another. If an ecosystem is considering adopting a new practice, e.g., to enhance stability, the outcome of trying to implement various stability-enhancing practices is likely to be contingent on the set of other practices already in place. For example, introducing semantic versioning to signal breaking changes would not make sense where snapshot consistency (current versions of everything must be compatible) is already enforced. Complementarity is the other side of the coin: Certain practices may be more effective if certain other practices are adopted as well. For example, centralized testing is likely to be more effective where an ecosystem has a repository with strong gatekeeping mechanism, and a norm that dissuades developers from using alternative repositories.\n\nWe suspect that many conflicts and complementarities among practices are much more subtle, and greater insight into these relations among practices would be very helpful to clarifying feasible\npaths for achieving ecosystem goals. Our survey data contains many starting points for investigations; for example, by allowing researchers to identify ecosystems with various combinations of values and practices as targets for further exploration.\n\n6.2 Assimilation or Ecosystem Selection?\n\nOur survey indicates that developers\u2019 personal values usually align well with the values of ecosystems (Figure 2) in which they operate. Understanding how this alignment comes about would help to predict the outcome of attempted interventions and design interventions more likely to be effective. There are at least two major possibilities. Developers may join ecosystems for reasons unrelated to values, e.g., the application domain or technical characteristics of the software. Being exposed to the ecosystem values, they may then assimilate over time, adapting their behavior and personal values to what they experience around them. However, the alignment may come about primarily through value-based selection, where developers join ecosystems because they resonate with the system\u2019s values.\n\nThese two possibilities will often carry different implications for interventions. If developers tend to assimilate the ecosystem\u2019s values, an existing community might be steered toward different practices and expect that developers will adapt over time. In contrast, if developers pick ecosystems based on compatible values, then this would likely mean that substantial changes would attract new value-aligned developers but risk significant disruption if long-term contributors rebel or leave. While one might expect some degree of both selection and assimilation, understanding which values and practices are more easily adapted, and which tend to be resistant to change, could be a big help in designing effective interventions.\n\nOur survey data does not provide insights into causation, but it can provide starting points for further investigations and can be combined with external data to approach the questions. We took a small step in this direction to illustrate some of the possibilities. If developers tend to assimilate practices and values from those around them, we would expect values and practices to be shared more among ecosystems with relatively large overlap of participating developers than in those with a relatively small overlap. As a preliminary study, we investigated whether ecosystems that share many developers\\(^{26}\\) have similar practices or values. Over all pairs of ecosystems, we found a sizable correlation between similarity of average responses on ecosystem practice questions (those depicted in Figures 3 and 4), and overlap in committers to those ecosystems (Spearman \\(\\rho = 0.341, p < .00001, n = 289\\) pairs of ecosystems, correlating average perceived ecosystem value for each pair of ecosystems with developer overlap between them). Interestingly, perceived values of the ecosystem do not seem to align with developer overlap (\\(\\rho = -0.05, p = 0.44, n = 289\\), correlating average personal value for each pair of ecosystems with developer overlap between them).\n\nWhile a number of interpretations of these relationships are possible, the data are consistent with the idea that practices diffuse among ecosystems that have large developer overlap, but values do not. Future work using time series data about developer overlap and historic participation in ecosystems would allow researchers to identify specific developers that moved to ecosystems with different or similar practices and values (according to our survey data) and use interviews, surveys, or data mining to see if and how their behavior changed.\n\n\\(^{26}\\)To measure developer overlap, we assembled a list of all packages in each ecosystem from libraries.io, Cargo.io, and LuaRocks.com, and we identified Eclipse plugins as non-fork packages in GitHub containing a \u201cplugin.xml\u201d file. Using the authors of commits to those packages\u2019 github projects as archived by Mockus [57], we counted what percent of each ecosystem\u2019s contributors also contributed to each other ecosystem. We excluded Bioconductor, because we had no clear mapping to GitHub repositories.\n6.3 When Are Attempted Changes Broadly Adopted?\n\nCollecting cases of effective and ineffective past changes in ecosystems can help to understand the conditions that favor broadly adopted changes. Examples of attempted policy or practices changes can often be found through surveys. In our survey, text answers about contrasting ecosystems often explained how practices were deliberately designed. Five Perl developers, for example, described how an extensive centralized testing infrastructure (CPAN Testers) was added to improve the quality and compatibility of CPAN modules. Perhaps beginning with our results and then conducting new interviews or surveys, it should be possible to unearth many examples of attempted change and to determine the outcome. A second approach could identify conflicts between values and practices to suggest ineffective changes. In the case of Rust, for example, the high value of stability (Figure 3(a)), but also high perception of instability (Figure 4(b)) led us to investigate Rust\u2019s struggle, as mentioned above, to promote practices leading to stable versions of libraries despite the community\u2019s eagerness to innovate with new features.\n\nIn Edgar Schein\u2019s work on organizational culture, his recommendations [70, p. 323ff] for changing an organization include strong role models for new behaviors, lowering learning anxiety, and raising survival anxiety (i.e., making people confident that they can learn new practices and aware that the community will fail if they do not). Elements of this advice are visible in the practices of ecosystems that have tried to change their values. In Rust, for example, the compiler team models stability practices that packages might follow [80]. Rust\u2019s stability attributes for packages may reduce learning anxiety by making it easier for downstream users to create stable interfaces, and Rust\u2019s annual survey helps developers see each others\u2019 agreement that there are problems with stability.\n\n7 CONCLUSION\n\nWhile managing change has long been an important topic in software engineering, it is particularly interesting in the context of open source ecosystems, since projects tend to be highly interdependent yet independently maintained. The variety of practices used to manage change is considerable, but perhaps most interestingly is what we might think of as the political dimension in the selection of practices. Whose interests are served by the adoption of one set of practices rather than others? How are the costs (primarily effort) distributed over types of ecosystem participants? What values to these practices actually serve?\n\nWe have attempted to provide a somewhat detailed description of practices used in three ecosystems, as well as a broader characterization of 18 ecosystems. We believe these studies just scratch the surface, however, and much work remains to be done in understanding how practices fit with values, and with each other, and how effective changes can be made to address ecosystem weaknesses. We hope through this work, and through the data we are making publicly available, to have contributed to a better understanding of these issues.\n\nAPPENDICES\n\nA STUDY 1 INTERVIEW PROTOCOL\n\nThe following lists the questions from our interview script. We did not ask each question to each interviewee, but instead we directed them towards areas where they had personal experience. Given our iterative approach, some questions in this script were added or modified after earlier interviews.\n\nFor maintainers of upstream packages:\n\n- Why do you work on <package1>?\n- Do you have any plan or strategy for how the interface of <package1> will evolve as people come to depend on it?\nThink about a recent larger change in your project. Was it backward-compatible? What impact did you expect it would have on packages that depend on `<package1>`?\n\nFollow up: Did you consider alternative ways of making `<change1>` that would have more or less impact on users of `<package1>`?\n\nFollow up: If you had not made `<change1>`, what would have happened differently for `<package1>`\u2019s future?\n\nFollow up: What is your position on backward compatibility?\n\nDoes the platform help/hinder you in evolution decisions as in `<change1>`? What if the platform had mechanism `<alternative mechanism>`?\n\nFor developers with upstream dependencies:\n\nWhy do you work on `<package1>`?\n\nIf there\u2019s a useful looking package that claims to provide some functionality you need, how do you decide whether to adopt it?\n\nWhat\u2019s your general strategy for choosing which version of a package to depend on?\n\nWhen do you think it\u2019s reasonable and expected for a package to change its interface?\n\nDo you prefer a stable but stale or a rapidly evolving but unstable dependency? What rate of interface change is too often?\n\nIs it a burden to have too many dependencies for a project?\n\nCan you give an example of a package you\u2019ve considered, and felt like its stability was a consideration (positively or negatively)?\n\nHow do you keep up with changes to packages you depend on?\n\nWhen `<change1>` happened in `<upstream package1>`, how did you first find out about it?\n\nAre you ever watching for development activity between releases?\n\nAre you using the Github notification mechanism and why/why not?\n\nIf you could have an ideal notification system to get important changes: What would such system look like, what changes would it notify you about?\n\nDid you think `<change1>` was an appropriate change, or should they have left it alone?\n\nFor developers having experience working on the platform, we asked questions about specific policies, their intentions, and their consequences. Here are some example questions about CRAN:\n\nCRAN differs from some other repositories in that it asks package authors to notify reverse dependency packages before submitting an update that breaks its API.\n\n\u2014Was there anything specific that precipitated that policy?\n\u2014Did you consider other options for solving the problem? What were the tradeoffs you thought about?\n\u2014How successful has that policy been so far?\n\nMore generally, CRAN has stricter requirements for authors than some other package repositories do. What factors does the CRAN team take into consideration when deciding if a quality standard is worth the effort of instituting and enforcing?\n\nBioconductor does coordinated releases of all the packages at once, while CRAN lets packages update on their own schedule.\n\n\u2014How and why did the two repositories end up having different policies?\n\u2014What have been the consequences for the two repositories?\n\u2014Will they likely stay that way?\n\nCRAN makes it easy to install only the latest version of a package; some repositories let users install old versions. Why is it done that way?\n\u2022 CRAN has more permissive expectations about version number changes than some platforms. Has the current system been sufficient, or have you considered altering the policies about numbering?\n\n\u2022 Can you tell me something about how potential breaking changes are handled among the developers of the base and recommended packages?\n \u2014 How do developers communicate to coordinate and synchronize changes?\n \u2014 Does it work differently for base and recommended than among ordinary packages in the CRAN repository?\n\nB STUDY 2 SURVEY QUESTIONS\n\nFor transparency and replicability, we list all evaluated questions of the survey including their exact phrasing. We exclude a small number of questions about power structures, community health, and motivation that we have not used in this article.\n\nPart I: Ecosystem.\n\n\u2022 Please choose ONE software ecosystem* in which you publish a package**. If you don\u2019t publish any packages, then pick an ecosystem whose packages you use.\n \u201cSoftware ecosystem\u201d = a community of people using and developing packages that can depend on each other, using some shared language or platform\n * \u201cPackage\u201d: A distributable, separately maintained unit of software. Some ecosystems have other names for them, such as \u201clibraries,\u201d \u201cmodules,\u201d \u201ccrates,\u201d \u201ccocoapods,\u201d \u201crocks,\u201d or \u201cgoodies,\u201d but we\u2019ll use \u201cpackage\u201d for consistency.\n [selection or textfield, substituted for <ecosystem> in remainder of survey]\n\nEcosystem Role.\n\n\u2022 Check the statement that best describes your role in this ecosystem.\n \u2014 I\u2019m a founder or core contributor to <ecosystem> (i.e., its language, platform, or repository).\n \u2014 I\u2019m a lead maintainer of a commonly-used package in <ecosystem>.\n \u2014 I\u2019m a lead maintainer of at least one package in <ecosystem>.\n \u2014 I have commit access to at least one package in <ecosystem>.\n \u2014 I have submitted a patch or pull request to a package in <ecosystem>.\n \u2014 I have used packages from <ecosystem> for code or scripts I\u2019ve written.\n\n\u2022 About how many years have you been using <ecosystem> in any way?\n \u2014 < 1 year\n \u2014 1\u20132 years\n \u2014 2\u20135 years\n \u2014 5\u201310 years\n \u2014 10\u201320 years\n \u2014 > 20 years\n\nEcosystem values.\n\n\u2022 How important do you think the following values are to the <ecosystem> community? (Not to you personally; we\u2019ll ask that separately.) [See Section 3.3.2 for the 11 value questions; results shown in Figure 2.]\n\n\u2022 How confident are you in your ratings of the values of <ecosystem> above?\n \u2014 Not confident\n \u2014 Slightly confident\n\u2014 Confident\n\u2014 Very confident\n\n\u2022 Is there some other value the <ecosystem> community emphasizes that was not asked above? If so, describe it here:\n\nPart II: Package.\n\n\u2022 In the following, we are going to ask about your experience working on one particular package. Please think of one package in <ecosystem> you have contributed to recently and are most familiar with. If you haven\u2019t contributed to a package in <ecosystem>, then name some software you\u2019ve written that relies on packages in <ecosystem> packages. You may use a pseudonym for it if you are concerned about keeping your responses anonymous. \u2014 [text fields, substituted for <package> in remainder of survey]\n\n\u2022 Do you submit the package you chose to a/the repository associated with <ecosystem>? (Choose \u201cno\u201d if the ecosystem does not have its own central repository.) \u2014 [yes/no]\n\n\u2022 Is there any software maintained by other people that depends on the package you chose? \u2014 [yes/no]\n\n\u2022 Is the package you chose installed by default as part of a standard basic set of packages or platform tools? \u2014 [yes/no]\n\n\u2022 How important are each of these values in development of <package> to you personally? [See Section 3.3.2 for the 11 value questions.]\n\n\u2022 (OPTIONAL) Is there some other value important to you personally for <package> which was not mentioned? \u2014 [text fields]\n\n\u2022 How often do you face breaking changes from any upstream dependencies (that require rework in <package>)? [Results shown in Figure 4(a).]\n \u2014 Never\n \u2014 Less than once a year\n \u2014 Several times a year\n \u2014 Several times a month\n \u2014 Several times a week\n \u2014 Several times a day\n\n\u2022 How often do you make breaking changes to <package>? (i.e., changes that might require end-users or downstream packages to change their code) \u2014 [frequency scale as above][Results shown in Figure 3(a).]\n\nMaking changes to <package>.\n\n\u2022 I feel constrained not to make too many changes to <package> because of\n \u2022 potential impact on users. [Results shown in Figure 3(b).]\n \u2014 Strongly agree\n \u2014 Somewhat agree\n \u2014 Neither agree nor disagree\n \u2014 Somewhat disagree\n \u2014 Strongly disagree\n \u2014 I don\u2019t know\n\n\u2022 I know what changes users of <package> want. \u2014 [agreement+don\u2019t know scale as above]\n\n\u2022 If I have multiple breaking changes to make to <package>, I try to batch them up into a single release. \u2014 [agreement+don\u2019t know scale as above][Results shown in Figure 3(d).]\n\n\u2022 I release <package> on a fixed schedule, which <package> users are aware of. \u2014 [agreement+don\u2019t know scale as above][Results shown in Figure 3(j).]\n\u2022 Releases of <package> are coordinated or synchronized with releases of packages by other authors. \u2014 [agreement+don\u2019t know scale as above][Results shown in Figure 3(i).]\n\n\u2022 When working on <package>, I make technical compromises to maintain backward compatibility for users. \u2014 [agreement+don\u2019t know scale as above][Results shown in Figure 3(c).]\n\n\u2022 When working on <package>, I often spend extra time working on extra code aimed at backward compatibility. (e.g., maintaining deprecated or outdated methods) \u2014 [agreement+don\u2019t know scale as above]\n\n\u2022 When working on <package>, I spend extra time backporting changes, i.e., making similar fixes to prior releases of the code, for backward compatibility. \u2014 [agreement+don\u2019t know scale as above]\n\nReleasing Packages.\n\n\u2022 A large part of the community releases updates/revisions to packages together at the same time. \u2014 [agreement+don\u2019t know scale as above]\n\n\u2022 A package has to meet strict standards to be accepted into the repository. \u2014 [agreement+don\u2019t know scale as above][Results shown in Figure 3(k).]\n\n\u2022 Most packages in <ecosystem> will sometimes have small updates without changing the version number at all. \u2014 [agreement+don\u2019t know scale as above]\n\n\u2022 Most packages in <ecosystem> with version greater than 1.0.0 increment the leftmost digit of the version number if the change might break downstream code. \u2014 [agreement+don\u2019t know scale as above]\n\n\u2022 I sometimes release small updates of <package> to users without changing the version number at all. \u2014 [agreement scale, without \u201cdon\u2019t know\u201d][Results shown in Figure 3(g).]\n\n\u2022 For my packages whose version is greater than 1.0.0, I always increment the leftmost digit if a change might break downstream code (semantic versioning). \u2014 [agreement as above][Results shown in Figure 3(f).]\n\n\u2022 When making a change to <package>, I usually write up an explanation of what changed and why (a change log). \u2014 [agreement as above][Results shown in Figure 3(e).]\n\n\u2022 When working on <package>, I usually communicate with users before performing a change, to get feedback or alert them to the upcoming change. \u2014 [agreement as above][Results shown in Figure 3(h).]\n\n\u2022 When making a breaking change on <package>, I usually create a migration guide to explain how to upgrade. \u2014 [agreement as above]\n\n\u2022 After making a breaking change to <package>, I usually assist one or more users individually to upgrade. (e.g., reaching out to affected users, submitting patches/pull requests, offering help) \u2014 [agreement as above]\n\nPart IV: Dependencies.\n\n\u2022 In the last 6 months I have participated in discussions, or made bug/feature requests, or worked on development of another package in <ecosystem> that one of my packages depends on. \u2014 [yes/no]\n\n\u2022 Have you contributed code to an upstream dependency of one of your packages in the last 6 months (one where you\u2019re not the primary developer)? \u2014 [yes/no]\n\n\u2022 About how often do you communicate with developers of packages you depend on (e.g., participating in mailing lists, conferences, Twitter conversations, filing bug reports or feature requests, etc.)? \u2014 [frequency scale, as above][Results shown in Figure 4(f).]\nFor most dependencies that my packages rely on, the way I typically become aware of a change to the dependency that might break my package is:\n\n- I read about it in the dependency project\u2019s internal media (e.g., dev mailing lists, not general public announcements) \u2014 [agreement scale, as above]\n- I read about it in the dependency project\u2019s external media (e.g., a general announcement list, blog, Twitter, etc) \u2014 [agreement scale, as above]\n- A developer typically contacts me personally to bring the change to my attention \u2014 [agreement scale, as above][Results shown in Figure 4(e).]\n- Typically I get a notification from a tool when a new version of the dependency is likely to break my package \u2014 [agreement scale, as above][Results shown in Figure 4(f).]\n- Typically, I find out that a dependency changed because something breaks when I try to build my package. \u2014 [agreement scale, as above][Results shown in Figure 4(g).]\n- How do you typically declare the version numbers of packages that <package> depends \u2014 [Results shown in Figure 4(i).]\n - I specify an exact version number\n - I specify a range of version numbers, e.g., 3.x.x, or [2.1 through 2.4]\n - I specify just a package name and always get the newest version\n - I specify a range or just the name, but I take a snapshot of dependencies (e.g., shrinkwrap, packrat)\n- What is the common practice in <ecosystem> for declaring version numbers of dependencies? \u2014 [same scale as previous + \u201cdon\u2019t know\u201d]\n\nUsing or avoiding dependencies.\n\n- When adding a dependency to <package>, I usually do significant research to assess the quality of the package or its maintainers, before relying on a package that seems to provide the functionality I need. \u2014 [agreement scale, as above][Results shown in Figure 4(d).]\n- It\u2019s only worth adding a dependency if it adds a substantial amount of value. \u2014 [agreement scale, as above][Results shown in Figure 4(c).]\n- I often choose NOT to update <package> to use the latest version of its dependencies. \u2014 [agreement scale, as above][Results shown in Figure 4(h).]\n- When adding a dependency, I usually create an abstraction layer (i.e., facade, wrapper, shim) to protect internals of my code from changes. \u2014 [agreement scale, as above]\n- When working on <package>, I often copy or rewrite segments of code from other packages into my package, to avoid creating a new dependency. \u2014 [agreement scale, as above]\n- When working on <package>, I must expend substantial effort to find versions of all my dependencies that will work together. \u2014 [agreement scale, as above]\n- (OPTIONAL) Compare <ecosystem> with other ecosystems you\u2019ve used or heard about \u2013 does one have some features that the other should adopt? If so, name the other ecosystem(s) and describe the feature(s). \u2014 [text field]\n- (OPTIONAL) Why do you think people chose to design these other ecosystem(s) differently from <ecosystem>? \u2014 [text field]\n\nPart V: Demographics and motivations.\n\n- Age\n - 18\u201324\n - 25\u201334\n\u2014 35\u201344\n\u2014 45\u201354\n\u2014 55\u201364\n\u2014 65+\n\u2022 Gender \u2014 [male/female/other]\n\u2022 Formal computer science education/training\n \u2014 None\n \u2014 Coursework\n \u2014 Degree\n\u2022 How many years have you been contributing to open source? (in any way, including writing code, documentation, engaging in discussions, etc) \u2014 [same time scale as \u201cyears used ecosystem\u201d above]\n\u2022 How many years have you been developing or maintaining software? \u2014 [same as previous]\n\u2022 (OPTIONAL) Is there anything else we should have asked, that would help us better understand your experience with community values and breaking changes in <ecosystem> If so, tell us about it: \u2014 [text field]\n\nC SUGGESTED SET OF VALUES FOR FUTURE STUDIES\n\nWe propose the following list of values that appear to distinguish software ecosystems. They are derived from Study 1 results plus examination of ecosystem webpages, then modified based on survey results, adding values that were suggested by survey respondents (Standardization, Technical Diversity, Usability, and Social Benevolence), and removing one that does not distinguish meaningfully among developers or ecosystems (Quality).\n\n\u2022 **Stability:** Backward compatibility, allowing seamless updates (\u201cdo not break existing clients\u201d).\n\u2022 **Innovation:** Innovation through fast and potentially disruptive changes.\n\u2022 **Replicability:** Long-term archival of current and historic versions with guaranteed integrity, such that exact behavior of code can be replicated.\n\u2022 **Compatibility:** Protecting downstream developers and end-users from struggling to find a compatible set of versions of different packages.\n\u2022 **Rapid Access:** Getting package changes through to end-users quickly after their release (\u201cno delays\u201d).\n\u2022 **Commerce:** Helping professionals build commercial software.\n\u2022 **Community:** Collaboration and communication among developers.\n\u2022 **Openness** and Fairness: ensuring that everyone in the community has a say in decision-making and the community\u2019s direction.\n\u2022 **Curation:** Selecting a set of consistent, compatible packages that cover users\u2019 needs.\n\u2022 **Fun and personal growth:** Providing a good experience for package developers and users.\n\u2022 **Standardization:** Promote standard tools and practices, limiting developers choice to save them time and effort.\n\u2022 **Technical Diversity:** Allowing developers freedom to develop and interact in a diversity of ways.\n\u2022 **Usability:** Ensuring that tools and libraries are easy for developers to use; ensuring resulting software is easy for end-users to use.\n\u2022 **Social Benevolence:** An ethical community empowering others by making software and other resources available.\n## D LOCK FILE NAMES IN EACH ECOSYSTEM\n\n| Ecosystem | Lock file | Notes |\n|----------------------------|----------------------------|----------------------------------------------------------------------|\n| Atom (plugins) | package-lock.json, npm-shinkwrap.json | (see Node.js/NPM below) |\n| CocoaPods | podfile.lock | |\n| Eclipse (plugins) | N/A | This function would be done within the project\u2019s regular metadata files (plugin.xml and pom.xml) and so could not be measured readily with this technique |\n| Erlang,Elixir/Hex | mix.lock | |\n| Go | GoPkg.lock, vendor/ | Preceding the GoPkg.lock file, a canonical method of locking down dependency versions was to simply include a snapshot of their source code; so we looked for a \"vendor/\" directory in the project. |\n| Haskell (Cabal/Hackage) | cabal.config | |\n| Haskell (Stack/Stackage) | cabal.config | Although possible, this was never used since Stackage\u2019s main distinguishing feature is to constrain the versions of a set of packages |\n| Lua/Luarocks | N/A | We could not find evidence of a canonical or even common practice way of locking down Lua versions |\n| Maven | N/A | This function would be done within the project\u2019s regular metadata file (pom.xml) and so could not be measured readily with this technique |\n| Node.js/NPM | package-lock.json, npm-shinkwrap.json | These are both npm lockfiles with some semantic differences; npm-shrinkwrap is intended to be published; package-lock is not; however, both can be found in GitHub projects. |\n| NuGet | project.lock.json | The NuGet blog suggests saving this file to a repository to lock in dependency versions. |\n| Perl/CPAN | cpanfile.snapshot | We could not find evidence of a canonical way to do this in CPAN, but one recommendation was a third-party package called Carton that creates this snapshot file. |\n| PHP/Packagist | composer.lock | |\n| Python/PyPi | N/A | We could not find evidence of a canonical way to do this in Pypi; a StackOverflow post suggested that there are several nonstandard alternatives. |\n| R/Bioconductor | packrat.lock | Not canonically standard, but common and well-known. However, it is mostly irrelevant for Bioconductor, since a set of mutually compatible packages are released as a unit. |\n| R/CRAN | packrat.lock | Not canonically standard, but common and well-known. |\n| Ruby/Rubygems | Gemfile.lock | |\n| Rust/Cargo | Cargo.lock | |\n\n27 [https://docs.npmjs.com/files/package-lock.json](https://docs.npmjs.com/files/package-lock.json).\n28 [https://blog.nuget.org/20181217/Enable-repeatable-package-restores-using-a-lock-file.html](https://blog.nuget.org/20181217/Enable-repeatable-package-restores-using-a-lock-file.html).\n29 [https://metacpan.org/pod/Carton](https://metacpan.org/pod/Carton).\n30 [https://stackoverflow.com/questions/8726207/what-are-the-python-equivalents-to-rubys-bundler-perls-carton](https://stackoverflow.com/questions/8726207/what-are-the-python-equivalents-to-rubys-bundler-perls-carton).\nACKNOWLEDGMENTS\n\nWe want to thank Audris Mockus and the WoC project at University of Tennessee, Knoxville, for access to the WoC archive [57] for data mining, and the many people interviewed and surveyed, and those who helped with the design and promotion of the survey.\n\nREFERENCES\n\n[1] Pietro Abate, Roberto DiCosmo, Ralf Treinen, and Stefano Zacchiroli. 2011. MPM: A modular package manager. In Proceedings of the International Symposium on Component Based Software Engineering (CBSE\u201911). ACM Press, New York, 179\u2013188. DOI: https://doi.org/10.1145/2000229.2000255\n\n[2] Rabe Abdalkareem. 2017. Reasons and drawbacks of using trivial npm packages: The developers\u2019 perspective. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE\u201917). ACM, New York, NY, 1062\u20131064.\n\n[3] Cyrille Artho, Kuniyasu Suzaki, Roberto Di Cosmo, Ralf Treinen, and Stefano Zacchiroli. 2012. Why do software packages conflict? IEEE International Working Conference on Mining Software Repositories, 141\u2013150.\n\n[4] Anat Bardi and Shalom H. Schwartz. 2003. Values and behavior: Strength and structure of relations. Personal. Soc. Psychol. Bull. 29, 10 (2003), 1207\u20131220.\n\n[5] Gabriele Bavota, Gerardo Canfora, Massimiliano Di Penta, Rocco Oliveto, and Sebastiano Panichella. 2015. How the Apache community upgrades dependencies: An evolutionary study. Empir. Softw. Eng. 20, 5 (2015), 1275\u20131317.\n\n[6] Christopher Bogart, Christian K\u00e4stner, James Herbsleb, and Ferdian Thung. 2016. How to break an API: Cost negotiation and community values in three software ecosystems. In Proceedings of the International Symposium Foundations of Software Engineering (FSE\u201916). ACM Press, New York.\n\n[7] Christopher Bogart, Anna Filippova, James Herbsleb, and Christian Kastner. 2017. Culture and Breaking Change: A Survey of Values and Practices in 18 Open Source Software Ecosystems. DOI: https://doi.org/10.1184/R1/5108716.v1\n\n[8] Shawn A. Bohner and Robert S. Arnold. 1996. Software Change Impact Analysis. IEEE Computer Society Press, Los Alamitos, CA.\n\n[9] Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualit. Res. Psychol. 3, 2 (2006), 77\u2013101. DOI: https://doi.org/10.1191/1478088706qp063oa\n\n[10] A. Brito, L. Xavier, A. Hora, and M. T. Valente. 2018. Why and how Java developers break APIs. In Proceedings of the IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER\u201918). 255\u2013265.\n\n[11] Javier Luis C\u00e1novas Izquierdo and Jordi Cabot. 2015. Enabling the definition and enforcement of governance rules in open source systems. Proceedings of the International Conference on Software Engineering (ICSE\u201915). 505\u2013514. DOI: https://doi.org/10.1109/ICSE.2015.184\n\n[12] Jaepil Choi and Heli Wang. 2007. The promise of a managerial values approach to corporate philanthropy. J. Bus. Ethics 75, 4 (2007), 345\u2013359.\n\n[13] Juliet Corbin and Anselm Strauss. 2014. Criteria for evaluation. In Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory (3rd ed.). Sage Publications, Inc.\n\n[14] Bradley E. Cossette and Robert J. Walker. 2012. Seeking the ground truth: A retroactive study on the evolution and migration of software libraries. In Proceedings of the International Symposium Foundations of Software Engineering (FSE\u201912). ACM Press, New York, 55.\n\n[15] John W. Creswell and J. David Creswell. 2014. Research Design: Qualitative, Quantitative, and Mixed Methods Approaches (4th ed.). Sage Publications.\n\n[16] Mary Crossan, Daina Mazutis, and Gerard Seijts. 2013. In search of virtue: The role of virtues, values and character strengths in ethical decision making. J. Bus. Ethics 113, 4 (2013), 567\u2013581.\n\n[17] Laura Dabish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social coding in GitHub: Transparency and collaboration in an open software repository. In Proceedings of the Conference on Computer Supported Cooperative Work (CSCW\u201912). 1277\u20131286.\n\n[18] Barth\u00e9l\u00e9my Dagenais and Martin P. Robillard. 2010. Creating and evolving developer documentation: Understanding the decisions of open source contributors. In Proceedings of the ACM International Symposium on Foundations of Software Engineering. 127\u2013136. DOI: https://doi.org/10.1145/1882291.1882312\n\n[19] Cleidson R. B. de Souza and David F. Redmiles. 2008. An empirical study of software developers\u2019 management of dependencies and changes. In Proceedings of the International Conference on Software Engineering (ICSE\u201908).\n\n[20] Cleidson R. B. De Souza and David F. Redmiles. 2009. On the roles of APIs in the coordination of collaborative software development. Comput. Supp. Coop. Work 18, 5-6 (2009), 445\u2013475. DOI: https://doi.org/10.1007/s10606-009-9101-3\n\n[21] Alexandre Decan, Tom Mens, Ma\u00eblick Claes, and Philippe Grosjean. 2016. When GitHub meets CRAN: An analysis of inter-repository package dependency problems. In Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering. 493\u2013504. DOI: https://doi.org/10.1109/SANER.2016.12\n[22] Alexandre Decan, Tom Mens, and Ma\u00eblick Claes. 2017. An empirical comparison of dependency issues in OSS packaging ecosystems. In Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering (SANER\u201917).\n\n[23] Dedoose. 2016. Version 7.0.23. Web Application for Managing, Analyzing, and Presenting Qualitative and Mixed Method Research Data. SocioCultural Research Consultants, LLC, Los Angeles, CA. Retrieved from www.dedoose.com\n\n[24] Jim des Rivi\u00e8res. 2005. API First. Retrieved from http://www.eclipsecon.org/2005/presentations/EclipseCon2005_12.2APIFirst.pdf\n\n[25] Jim des Rivi\u00e8res. 2007. Evolving Java-based APIs. Retrieved from https://wiki.eclipse.org/Evolving_Java-based_APIs\n\n[26] Jens Dietrich, David J. Pearce, Jacob Stringer, and Kelly Blincoe. 2019. Dependency versioning in the wild. In Proceedings of the Conference on Mining Software Repositories (MSR\u201919). 349\u2013359. DOI: https://doi.org/10.1109/MSR.2019.00061\n\n[27] Don A. Dillman, Jolene D. Smyth, and Leah Melani Christian. 2014. Internet, Phone, Mail, and Mixed-mode Surveys: The Tailored Design Method. John Wiley & Sons.\n\n[28] Alexander Eck. 2018. Coordination across open source software communities: Findings from the rails ecosystem. In Tagungsband Multikonferenz Wirtschaftsinformatik (MKWI\u201918). 109\u2013120.\n\n[29] Stephen G. Eick, Todd L. Graves, Alan F. Karr, J. S. Marron, and Audris Mockus. 2001. Does code decay? Assessing the evidence from change management data. IEEE Trans. Softw. Eng. 27, 1 (Jan. 2001), 1\u201312. DOI: https://doi.org/10.1109/32.895984\n\n[30] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Boston, MA.\n\n[31] R. Stuart Geiger. 2017. Summary analysis of the 2017 GitHub open source survey. CoRR abs/1706.02777 (2017).\n\n[32] Gemnasium. 2017. Gemnasium. Retrieved on 28 April, 2021 from https://web.archive.org/web/20180324121439/https://gemnasium.com/\n\n[33] Mohammad Gharehyazie, Baishakhi Ray, and Vladimir Filkov. 2017. Some from here, some from there: Cross-project code reuse in GitHub. In Proceedings of the IEEE International Working Conference on Mining Software Repositories. 291\u2013301. DOI: https://doi.org/10.1109/MSR.2017.15\n\n[34] GitHub, Inc. 2017. Open Source Survey 2017. Retrieved from http://opensourcesurvey.org/2017/ on 4/28/2021.\n\n[35] The Neighbourhoodie Software GmbH. 2017. Greenkeeper.io. Retrieved on 28 April, 2021 from https://web.archive.org/web/20180224075015/https://greenkeeper.io/\n\n[36] Johannes Henkel and Amer Diwan. 2005. CatchUp!: Capturing and replaying refactorings to support API evolution. In Proceedings of the International Conference on Software Engineering (ICSE\u201905). ACM Press, New York, 274\u2013283.\n\n[37] Steven Hitlin and Jane Allyn Piliavin. 2004. Values: Reviving a dormant concept. Ann. Rev. Sociol. 30, 1 (2004), 359\u2013393.\n\n[38] Reid Holmes and Robert J. Walker. 2010. Customized awareness: Recommending relevant external change events. In Proceedings of the International Conference on Software Engineering (ICSE\u201910). ACM Press, New York, 465\u2013474. DOI: https://doi.org/10.1145/1806799.1806867\n\n[39] Daqing Hou and Xiaojia Yao. 2011. Exploring the intent behind API evolution: A case study. In Proceedings of the Working Conference on Reverse Engineering (WCRE\u201911). IEEE Computer Society, Los Alamitos, CA, 131\u2013140.\n\n[40] Marco Iansiti and Roy Levien. 2004. The Keystone Advantage: What the New Dynamics of Business Ecosystems Mean for Strategy, Innovation, and Sustainability. Harvard Business Press, Boston, MA.\n\n[41] Javier Luis C\u00e1novas Izquierdo and Jordi Cabot. 2015. Enabling the definition and enforcement of governance rules in open source systems. In Proceedings of the International Conference on Software Engineering (ICSE\u201915). IEEE, 505\u2013514.\n\n[42] Steven J. Jackson, David Ribes, Ayse G. Buyuktur, and Geoffrey C. Bowker. 2011. Collaborative rhythm: Temporal dissonance and alignment in collaborative scientific work. In Proceedings of the Conference on Computer Supported Cooperative Work (CSCW\u201911). 245\u2013254.\n\n[43] Slinger Jansen and Michael A. Cusumano. 2013. Defining software ecosystems: A survey of software platforms and business network governance. In Software Ecosystems: Analyzing and Managing Business Networks in the Software Industry. Edward Elgar Publishing.\n\n[44] Puneet Kapur, Brad Cossette, and Robert J. Walker. 2010. Refactoring references for library migration. In Proceedings of the International Conference on Object-oriented Programming, Systems, Languages and Applications (OOPSLA\u201910). ACM Press, New York, 726\u2013738. DOI: https://doi.org/10.1145/1869459.1869518\n\n[45] Smitha Keertipati, Sherlock A. Licorish, and Bastin Tony Roy Savarimuthu. 2016. Exploring decision-making processes in Python. In Proceedings of the International Conference on Evaluation and Assessment in Software Engineering. ACM, 43.\n\n[46] Riivo Kikas, Georgios Gousios, Marlon Dumas, and Dietmar Pfahl. 2017. Structure and evolution of package dependency networks. In Proceedings of the 14th International Conference on Mining Software Repositories (MSR\u201917). IEEE Press, Piscataway, NJ, 102\u2013112.\n[47] Daniel Le Berre and Pascal Rapicault. 2009. Dependency management for the eclipse ecosystem: Eclipse P2, metadata and resolution. In Proceedings of the International Workshop on Open Component Ecosystems (IWOCE\u201909). 21\u201330. DOI: https://doi.org/10.1145/1595800.1595805\n\n[48] Mario Linares-V\u00e1squez, Gabriele Bavota, Carlos Bernal-C\u00e1rdenas, Massimiliano Di Penta, Rocco Oliveto, and Denys Poshyvanyk. 2013. API change and fault proneness: A threat to the success of Android apps. In Proceedings of the European Software Engineering Conference/Foundation of Software Engineering (ESEC/FSE\u201913). ACM Press, New York, 477\u2013487.\n\n[49] Cristina V. Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. D\u00e9j\u00e0Vu: A map of code duplicates on GitHub. Proc. ACM Program. Lang. 1, OOPSLA (2017), 1\u201328. DOI: https://doi.org/10.1145/3133908\n\n[50] Mircea F. Lungu. 2009. Reverse Engineering Software Ecosystems. Ph.D. Dissertation. University of Lugano.\n\n[51] Fabio Mancinelli, Jaap Boender, Roberto Di Cosmo, Jerome Vouillon, Berke Durak, Xavier Leroy, and Ralf Treinen. 2006. Managing the complexity of large free and open source package-based software distributions. 199\u2013208. DOI: https://doi.org/10.1109/ASE.2006.49\n\n[52] Konstantinos Manikas. 2016. Revisiting software ecosystems research: A longitudinal literature study. J. Syst. Softw. 117 (2016), 84\u2013103.\n\n[53] Michael Mattsson and Jan Bosch. 2000. Stability assessment of evolving industrial object-oriented frameworks. J. Softw. Maint.: Res. Pract. 12, 2 (2000), 79\u2013102.\n\n[54] Tyler McDonnell, Baishakhi Ray, and Miryung Kim. 2013. An empirical study of API stability and adoption in the Android ecosystem. In Proceedings of the International Conference on Software Maintenance (ICSM\u201913). IEEE Computer Society, Los Alamitos, CA.\n\n[55] T. Mens. 2016. An ecosystemic and socio-technical view on software maintenance and evolution. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME\u201916). 1\u20138.\n\n[56] David G. Messerschmitt, Clemens Szyperski et al. 2005. Software Ecosystem: Understanding an Indispensable Technology and Industry. MIT Press Books.\n\n[57] Audris Mockus. 2009. Amassing and indexing a large sample of version control systems: Towards the census of public source code history. In Proceedings of the IEEE Conference on Mining Software Repositories (MSR\u201909).\n\n[58] Emerson Murphy-Hill, Thomas Zimmerman, and Nachiappan Nagappan. 2014. Cowboys, ankle sprains, and keepers of quality: How is video game development different from software development? In Proceedings of the International Conference on Software Engineering (ICSE\u201914). DOI: https://doi.org/10.1145/2568225.2568226\n\n[59] Linda Northrop, Peter Feiler, Richard P. Gabriel, John Goodenough, Rick Linger, Tom Longstaff, Rick Kazman, Mark Klein, Douglas Schmidt, Kevin Sullivan, and Kurt Wallnau. 2006. Ultra-large-scale Systems: The Software Challenge of the Future. Software Engineering Institute.\n\n[60] Siobh\u00e1n O\u2019Mahony and Fabrizio Ferraro. 2007. The emergence of governance in an open source community. Acad. Manag. J. 50, 5 (2007), 1079\u20131106.\n\n[61] Jeroen Ooms. 2013. Possible directions for improving dependency versioning in R. R Journal 5, 1 (2013), 1\u20139.\n\n[62] Klaus Ostermann, Paolo G. Giarrusso, Christian K\u00e4stner, and Tillmann Rendel. 2011. Revisiting information hiding: Reflections on classical and nonclassical modularity. In Proceedings of the European Conference on Object-oriented Programming (ECOOP\u201911) (Lecture Notes in Computer Science), Vol. 6813. Springer-Verlag, Berlin, 155\u2013178.\n\n[63] David L. Parnas. 1972. On the criteria to be used in decomposing systems into modules. Commun. ACM 15, 12 (1972), 1053\u20131058. DOI: https://doi.org/10.1145/361598.361623\n\n[64] Raphael Pham, Leif Singer, Olga Liskin, Fernando Figueira Filho, and Kurt Schneider. 2013. Creating a shared understanding of testing culture on a social coding site. In Proceedings of the International Conference on Software Engineering (ICSE\u201913). IEEE Computer Society, Los Alamitos, CA, 112\u2013121.\n\n[65] Tom Preston-Werner. 2013. Semantic Versioning 2.0.0. Retrieved from http://semver.org.\n\n[66] Steven Raemaekers, Arie van Deursen, and Joost Visser. 2012. Measuring software library stability through historical version analysis. In Proceedings of the International Conference on Software Maintenance (ICSM\u201912). IEEE Computer Society, Los Alamitos, CA, 378\u2013387.\n\n[67] Steven Raemaekers, Arie Van Deursen, and Joost Visser. 2014. Semantic versioning versus breaking changes: A study of the Maven repository. In Proceedings of the International Working Conference on Source Code Analysis and Manipulation (SCAM\u201914). IEEE Computer Society, Los Alamitos, CA, 215\u2013224. DOI: https://doi.org/10.1109/SCAM.2014.30\n\n[68] Romain Robbes, Mircea Lungu, and David R\u00f6thlisberger. 2012. How do developers react to API deprecation? The case of a smalltalk ecosystem. In Proceedings of the International Symposium Foundations of Software Engineering (FSE). ACM Press, New York. DOI: https://doi.org/10.1145/2393596.2393662\n\n[69] RStudio Team. 2015. RStudio: Integrated Development for R. Technical Report. RStudio, Inc., Boston MA. Retrieved from www.rstudio.com\n[70] Edgar H. Schein and Peter Schein. 2017. *Organizational Culture and Leadership* (5th ed.). Wiley.\n\n[71] Shalom H. Schwartz. 1992. Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. *Adv. Exper. Soc. Psychol.* 25 (1992), 1\u201365.\n\n[72] Leif Singer, Fernando Figueira Filho, and Margaret-Anne Storey. 2014. Software engineering at the speed of light: How developers stay current using Twitter. In *Proceedings of the International Conference on Software Engineering (ICSE\u201914)*. 211\u2013221. DOI: https://doi.org/10.1145/2568225.2568305\n\n[73] Ian Sommerville. 2010. *Software Engineering* (9th ed.). Pearson Addison Wesley.\n\n[74] Diomidis Spinellis. 2012. Package management systems. *IEEE Softw.* 29, 2 (2012), 84\u201386.\n\n[75] Adam Stakoviak, Andrew Thorp, and Isaac Schleuter. 2013. The Changelog. Retrieved from https://changelog.com/101/.\n\n[76] Peri Tarr, Harold Ossher, William Harrison, and Stanley M. Sutton, Jr. 1999. N degrees of separation: Multi-dimensional separation of concerns. In *Proceedings of the International Conference on Software Engineering (ICSE\u201999)*. IEEE Computer Society, Los Alamitos, CA, 107\u2013119.\n\n[77] The LibreOffice Design Team. 2017. What Open Source Means To LibreOffice Users. Retrieved from https://design.blog.documentfoundation.org/2017/09/13/open-source-means-libreoffice-users/.\n\n[78] The Rust Team. 2021. The Cargo Book. Retrieved on 28 April, 2021 from https://doc.rust-lang.org/cargo/faq.html#why-do-binaries-have-cargolock-in-version-control-but-not-libraries.\n\n[79] Jonathan Tuner. 2016. State of Rust Survey 2016. Retrieved from https://blog.rust-lang.org/2016/06/30/State-of-Rust-Survey-2016.html.\n\n[80] A. Turon and N. Matsakis. 2014. Stability as a Deliverable (The Rust Programming Language Blog). Retrieved from https://blog.rust-lang.org/2014/10/30/Stability.html.\n\n[81] Ivo van den Berk, Slinger Jansen, and L\u00fatzen Luinenburg. 2010. Software ecosystems. In *Proceedings of the European Conference on Software Architecture (ECSA\u201910)*. 127\u2013134. DOI: https://doi.org/10.1145/1842752.1842781\n\n[82] Bill Venners. 2003. The Philosophy of Ruby: A Conversation with Yukihiro Matsumoto, Part I. Retrieved from http://www.artima.com/intv/rubyP.html.\n\n[83] Jonathan Wareham, Paul B. Fox, and Josep Llu\u00eds Cano Giner. 2014. Technology ecosystem governance. *Organiz. Sci.* 25, 4 (2014), 1195\u20131215.\n\n[84] Mark Weiser. 1984. Program slicing. *IEEE Trans. Softw. Eng.* 10, 4 (1984), 352\u2013357.\n\n[85] Joel West. 2003. How open is open enough?: Melding proprietary and open source platform strategies. *Res. Polic.* 32, 7 (2003), 1259\u20131285.\n\n[86] Joel West and Siobh\u00e1n O\u2019Mahony. 2008. The role of participation architecture in growing sponsored open source communities. *Industr. Innov.* 15, 2 (2008), 145\u2013168.\n\n[87] Hadley Wickham. 2015. *Releasing a Package*. O\u2019Reilly Media, Sebastopol, CA. Retrieved from http://r-pkgs.had.co.nz/release.html.\n\n[88] Wei Wu, Foutse Khomh, Bram Adams, Yann Ga\u00ebl Gu\u00e9h\u00e9neuc, and Giuliano Antoniol. 2015. An exploratory study of API changes and usages based on Apache and Eclipse ecosystems. *Empir. Softw. Eng.* (2015), 1\u201347. DOI: https://doi.org/10.1007/s10664-015-9411-7\n\n[89] Wei Wu, Foutse Khomh, Bram Adams, Yann-Ga\u00ebl Gu\u00e9h\u00e9neuc, and Giuliano Antoniol. 2016. An exploratory study of API changes and usages based on Apache and Eclipse ecosystems. *Empir. Softw. Eng.* 21, 6 (2016), 2366\u20132412.\n\n[90] Laerte Xavier, Aline Brito, Andre Hora, and Marco Tulio Valente. 2017. Historical and impact analysis of API breaking changes: A large-scale study. In *Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER\u201917)*. IEEE, 138\u2013147.\n\n[91] Yihui Xie. 2013. R Package Versioning. Retrieved from http://yihui.name/en/2013/06/r-package-versioning/.\n\n[92] Robert A. Yin. 2013. *Case Study Research: Design and Methods* (5th ed.). Sage Publications.\n\nReceived August 2019; revised December 2020; accepted January 2021", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/003-bogart.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 56, "total-input-tokens": 149266, "total-output-tokens": 49796, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 3695, 1], [3695, 7968, 2], [7968, 11900, 3], [11900, 15966, 4], [15966, 19772, 5], [19772, 23034, 6], [23034, 26380, 7], [26380, 30308, 8], [30308, 32139, 9], [32139, 36149, 10], [36149, 40126, 11], [40126, 41734, 12], [41734, 44347, 13], [44347, 48057, 14], [48057, 51885, 15], [51885, 55542, 16], [55542, 59477, 17], [59477, 63069, 18], [63069, 67297, 19], [67297, 71539, 20], [71539, 75605, 21], [75605, 79795, 22], [79795, 83725, 23], [83725, 87897, 24], [87897, 91640, 25], [91640, 95620, 26], [95620, 99841, 27], [99841, 103406, 28], [103406, 107389, 29], [107389, 111499, 30], [111499, 115166, 31], [115166, 118337, 32], [118337, 122316, 33], [122316, 127394, 34], [127394, 132669, 35], [132669, 138336, 36], [138336, 142283, 37], [142283, 145860, 38], [145860, 149072, 39], [149072, 153134, 40], [153134, 156906, 41], [156906, 161016, 42], [161016, 164915, 43], [164915, 169077, 44], [169077, 172757, 45], [172757, 175806, 46], [175806, 178273, 47], [178273, 180808, 48], [180808, 183952, 49], [183952, 186925, 50], [186925, 189546, 51], [189546, 193582, 52], [193582, 198613, 53], [198613, 203786, 54], [203786, 209028, 55], [209028, 213004, 56]]}}
|
|
{"id": "0fb7cb7e84edf292e4cd492af068bda538f45e03", "text": "Maintaining interoperability in open source software: A case study of the Apache PDFBox project\n\nSimon Butler\\textsuperscript{a,}\\textsuperscript{*}, Jonas Gamalielsson\\textsuperscript{a,}\\textsuperscript{*}, Bj\u00f6rn Lundell\\textsuperscript{b,}\\textsuperscript{*}, Christoffer Brax\\textsuperscript{b}, Anders Mattsson\\textsuperscript{c}, Tomas Gustavsson\\textsuperscript{d}, Jonas Feist\\textsuperscript{e}, Erik L\u00f6nroth\\textsuperscript{f}\n\n\\textsuperscript{a}University of Sk\u00f6vde, Sk\u00f6vde, Sweden\n\\textsuperscript{b}Combitech AB, Link\u00f6ping, Sweden\n\\textsuperscript{c}Husqvarna AB, Huskvarna, Sweden\n\\textsuperscript{d}PrimeKey Solutions AB, Stockholm, Sweden\n\\textsuperscript{e}RedBridge AB, Stockholm, Sweden\n\\textsuperscript{f}Scania IT AB, S\u00f6dert\u00e4lje, Sweden\n\n\\textbf{A B S T R A C T}\n\nSoftware interoperability is commonly achieved through the implementation of standards for communication protocols or data representation formats. Standards documents are often complex, difficult to interpret, and may contain errors and inconsistencies, which can lead to differing interpretations and implementations that inhibit interoperability. Through a case study of two years of activity in the Apache PDFBox project we examine day-to-day decisions made concerning implementation of the PDF specifications and standards in a community open source software (OSS) project. Thematic analysis is used to identify semantic themes describing the context of observed decisions concerning interoperability. Fundamental decision types are identified including emulation of the behaviour of dominant implementations and the extent to which to implement the PDF standards. Many factors influencing the decisions are related to the sustainability of the project itself, while other influences result from decisions made by external actors, including the developers of dependencies of PDFBox. This article contributes a fine grained perspective of decision-making about software interoperability by contributors to a community OSS project. The study identifies how decisions made support the continuing technical relevance of the software, and factors that motivate and constrain project activity.\n\n\u00a9 2019 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license. (http://creativecommons.org/licenses/by/4.0/)\n\n1. Introduction\n\nMany software projects seek to implement one or more standards to support interoperability with other software. For example, interconnected systems implement standardised communications protocols, such as the open systems interconnect stack, and web standards, including the hypertext transfer protocol (HTTP) and the secure sockets layer (SSL), to support information exchange and commercial activities (Wilson, 1998; Treese, 1999; Ko et al., 2011).\n\nAs businesses and civil society \u2014 governments at national and local level, and the legal system \u2014 move away from paper documents (Lundell, 2011; Rossi et al., 2008) to rely increasingly on digitised systems, the implementation of both communication protocols and document standards becomes ever more crucial (Rossi et al., 2008; Wilson et al., 2017; Lehtonen et al., 2018). Standards are written by humans, and despite the care taken in their creation they are imperfect, vague, ambiguous and open to interpretation when implemented in software (Allman, 2011; Egyedi, 2007). Furthermore, practice evolves so that implementations, often seen as the de facto reference for a standard, can diverge from the published standard as has been the case with the JPEG image format (Richter and Clark, 2018). Indeed, practice can, for example, with HTML, CSS and JavaScript, repeatedly deviate from standards, sometimes with the intention of locking users in to specific products (W3C, 2019a; Bouvier, 1995; Phillips, 1998) and with the consequence that web content becomes challenging to implement and access (Phillips, 1998), and to archive (Kelly et al., 2014).\n\nWhile software interoperability relies on standards, different software implementations of a given standard are interpretations...\nof the standard that may not be fully interoperable (Egyedi, 2007). Consequently, the developers of software implementations will become involved in a discourse to find a common understanding of the standard that supports interoperability, as illustrated by Allman (2011), Lehtonen et al. (2018), and Watteyne et al. (2016). The means by which interoperability is achieved varies. The Internet Engineering Task Force (IETF) (IETF, 2019a), for example, uses a process, often summarised as \u201cRough consensus and running code\u201d (Davies and Hoffmann, 2004), that requires interoperability between independent implementations is achieved early in the standardisation process (Wilson, 1998). An increasing proportion of software that implements communication and data standards, particularly where it is non-differentiating, is developed through collaboration by companies working in community open source software (OSS) projects (Lundell et al., 2017; Butler et al., 2019). By community OSS project we mean OSS projects managed by foundations or are collectively organised (Riehle, 2011), where many of the developers are directed by companies and other organisations, and collaborate to create high quality software (Fitzgerald, 2006). Examples of this process include OSS projects under the umbrella of the Eclipse Internet of Things Working Group (Eclipse IoT Working Group, 2019), and LibreOffice (The Document Foundation, 2019). In many cases and domains both OSS and proprietary solutions are available for the same standard and need to interoperate to remain relevant products. While the literature documents the process of standardisation, and the technical challenges of implementing standards compliant software, there is little research that focuses on how participants in OSS projects decide how to implement a standard, and how to revise their implementation to correct or improve its behaviour. To explicate the challenges facing community OSS projects developing standards compliant software and the day-to-day decisions made by contributors this study investigates the following research question:\n\nHow does a community OSS project maintain software interoperability?\n\nWe address the research question through a case study (Gerring, 2017; Walsham, 2006) of two years of contributions to the Apache PDFBox1 OSS project. The PDFBox project is governed by the Apache Software Foundation (ASF) (ASF, 2019a) and develops and maintains a mature (Black Duck, 2019) Java library and tools to create and process Portable Document Format (PDF) documents (Lehmkuhler, 2010). PDFBox is used in other OSS projects (Apache Tika, 2019; CEF Digital, 2019; Khudairi, 2017), and as a component in proprietary products and services. PDFBox is described further in Section 3.2.\n\nDeveloped in the 1990s, PDF is a widely-used file format for distributing documents, which are created, processed and read by many different applications on multiple platforms. Versions of PDF are defined in a number of specifications and standards documents, including formal (ISO) standards, that implementers need to follow to ensure the interoperability of their software. There is evidence that the PDF standards are challenging to implement (Bogk and Sch\u00f6pl, 2014; Endignoux et al., 2016), that the quality of PDF documents varies (Lehtonen et al., 2018; Lindlar et al., 2017), and that the dominance of Adobe\u2019s software products creates user expectations that need to be met by the developers of other PDF software (Gamalielsson and Lundell, 2013; Endignoux et al., 2016; Amiouny, 2017; 2016). In the following section we provide a background description of PDF, and also review the related academic literature.\n\nSection 3 details reasons for the purposeful sampling (Patton, 2015) of PDFBox as the case study subject. We also identify the data sources investigated for the case study and give an account of the application of thematic analysis (Braun and Clarke, 2006) to identify semantic themes in the types of decisions concerning the interoperability of PDFBox made by contributors to the project and the factors influencing those decisions.\n\nThrough the analysis of the data we identified four fundamental types of decision made concerning the interoperability of PDFBox related to compliance with published PDF specifications and standards. The types of decision and the technical circumstances in which they are made are described in Section 4. We also provide an account of the factors identified that influence those decisions including resources, knowledge, and the influence of external actors, such as the developers of other PDF software, and the creators of documents. We discuss the challenges faced by the PDFBox project in Section 5 including the technical challenges faced by the developers of PDF software, and potential solutions. Thereafter, we consider how the behaviour of contributors to the PDFBox project sustains the project in the long term. Lastly, we present the conclusions in Section 6 and identify the contributions made by this study.\n\n2. Background and related work\n\n2.1. Standards development and interoperability\n\nThe development of standards for information and communications technologies is undertaken by companies and other organisations using a range of approaches, e.g. whether the technology is implemented before the standard is developed, and the working practices of the standards body involved. One perspective is that standards have two different types of origin. Some standards are specified by standards bodies, e.g. ISO and ITU. While others arise through extensive or widespread use of a particular technology, regardless of whether it was developed by one company or collaboratively (Treece, 1999). Another perspective is that standards are either requirement-led or implementation-led (Phipps, 2019). Phipps, a director (and sometime President) of the Open Source Initiative, argues the primary use of the requirement-led model is where standardisation is used to create a market, for example the development of 5G (Nikolich et al., 2017). In contrast, implementation-led standards are developed to support an innovation in software or data format that has been adopted by a wider audience than the creating company and standardisation is necessary to support interoperability. A third view is provided by Lundell and Gamalielsson (2017) who identify standards that are developed before software, software that is implemented and then forms the basis of a standardisation process (including that of PDF), and the development of standards in parallel with software. The latter process is identified as being of increasing importance in the telecommunications industry (Wright and Druta, 2014), and examples can be found in the standardisation process for internet protocols managed by the IETF (IETF, 2019a). The IETF emphasises interoperability at an early stage of protocol development, rather than technical perfection (Bradner, 1996; Wilson, 1998; Bradner, 1999). The process of developing interoperability between low powered devices in the IoT domain is described by Ko et al. (2011). They record the development of the internet protocol (IP) in 6LoWPAN to provide interoperable communications stacks for two IoT operating systems Contiki-OS and TinyOS. The interoperable implementations are then used to determine whether the solutions achieved are practicable for the types of IoT devices expected to use them (Ko et al., 2011).\n\nA further approach to interoperability is the development of implementations of standards, particularly communication protocols,\nin OSS projects. Companies participating in the Eclipse IoT Working Group (2019), for example, collaborate, sometimes with competitors, in OSS projects to develop implementations of open communications standards used in the IoT domain that then support their products (Butler et al., 2019). Examples include the implementation of the Open Mobile Alliance\u2019s (OMA, 2019) lightweight machine to machine (LWM2M) protocol in Leshan (Eclipse Foundation, 2019b) and Wakaama (Eclipse Foundation, 2019c), and the constrained application protocol (CoAP) (Shelby et al., 2014) in Californium (Eclipse Foundation, 2019a). Additionally, the collaborative OSS project serves to identify and document cogent misinterpretations and misunderstandings of the standard (Butler et al., 2019).\n\n2.2. PDF standards and interoperability\n\nAdobe Systems developed PDF as a platform-independent, inter-change format for documents that can preserve presentation independently of the application and operating system. In 1993, the first PDF specification was made freely available and a number of revisions of the specification have been published since (see Table 1). Some versions of the specification have been published as ISO standards (e.g. ISO 32000-1:2008 and ISO 32000-2:2017), including specialised subsets of the PDF format for the print industry (e.g. ISO 15929:2002 and ISO 15930-1:2001), and engineering applications (e.g. ISO 24517-1:2008).\n\nPDF documents vary in size and complexity from single page tickets, receipts and order summaries, through academic papers, to very large documents, such as Government reports, and books. Consequently, PDF documents may have short lifespans, or have a significantly longer life as business and legal records, particularly as organisations move away from paper. Many different software packages exist to create, display, edit and process PDF files. Further, a significant problem for long-term use of PDF is that many documents will outlive the software used to create them (Gamalielsson and Lundell, 2013), so will require standards compliant software that can faithfully reproduce the documents to be available at some arbitrary point in the future.\n\nPDF software, therefore, does not work in isolation; it must interoperate with other software to the extent that implementations need to be able to process documents created by other software, regardless of how long ago, and to create documents that other implementations can read. Furthermore there is the requirement that those documents be readable many years in the future, particularly in the case of documents such as contracts and official documentation issued by governmental agencies. These requirements are not a theoretical exercise, they are practical requirements that already pose problems for organisations and businesses. For example, in the dataset examined for this article there is evidence that contractors for the Government of the Netherlands have created many thousands of official academic transcripts as PDF documents that do not comply with the PDF specifications and are, at best, problematic to process (see mailing list thread Users-1, Table 5 on p 8).\n\nPDF is a complex file format that is used to create documents with a rich variety of content including text, images, internal document links, indexes, fillable forms, and digital signatures. Each version of the PDF standard cites normative references \u2014 other standards \u2014 that form part of the standard and are described as \u201c... indispensable for the application of this document\u201d in ISO 32000-1:2008 (ISO, 2008). The normative references include standards for fonts, image formats, and character encodings. In addition, several normatively referenced standards include normative references themselves (and so on). For example, among the normative references of ISO 32000-2:2017 is part 1 of an early revision of the JPEG 2000 ISO standard (ISO/IEC 15444-1:2004) which in turn has 13 normative references, including 10 IEC, ISO and ISO/IEC standards. The specifications and standards also define the declarative programming language that describes PDF documents, as well as the expected behaviours and capabilities of programs that create and process PDF documents. The size and complexity of the PDF specifications and ISO standards themselves pose a daunting challenge for software developers implementing them. The recently published ISO 32000-2:2017 standard, for example, consists of 984 pages and has 90 normative references (ISO, 2017). Further challenges complicate the development of software that works with PDF files. A key challenge is the common perception that the Adobe Reader family of software applications are the de facto reference implementations of the PDF specifications and standards to which the performance of other implementations is compared (Amiouny, 2016; Lehtonen et al., 2018). Another source of difficulty is the Robustness Principle (Allman, 2011), otherwise known as Postel\u2019s Law, which is applied in Adobe\u2019s Reader products, and stated by Postel, in the context of communication protocols, as, \u201c... be conservative in what you do, be liberal in what you accept from others.\u201d (Postel, 1981). In practice, PDF reading and processing software implements repair mechanisms to allow malformed files to be read, within limitations. The limitations, however, are only documented in the behaviour of Adobe\u2019s products.\n\n2.3. Related work\n\nA key aspect of software interoperability is the agreement and documentation of data formats and communication protocols in specifications and standards. There are many practical challenges to the standardisation process, and a number of approaches have been tried. Ahlgren et al. (2016) argue that open standardisation processes are needed to support interoperability in the IoT domain. An example can be found in the development of implementations of the 6TiSCH communications protocol for low-power devices (Watteyne et al., 2016). Watteyne et al. describe an iterative process of interoperability testing between implementations and how the lessons learnt through testing inform further iterations of the standardisation process. Another example is the standardisation of the QUIC protocol. Originally implemented by Google, QUIC has been in use for some 6 years and a standard is being developed by an IETF committee (IETF, 2019b; 2019c; Piraux et al., 2018).\n\nTable 1\nSelected PDF versions and ISO standards.\n\n| Version | ISO Standard | Year | Comment |\n|---------|--------------|------|---------|\n| PDF v1.0 | First published PDF specification. |\n| PDF v1.4 | Improved encryption, added XML metadata, and pre-defined CMaps. |\n| PDF v1.5 | Added JPEG 2000 images and improved encryption. |\n| PDF/A-1 | ISO 19005-1:2005 | 2005 | An archive format for standalone PDF documents based on PDF v1.4. |\n| PDF v1.7 | ISO 32000-1:2008 | 2008 | Extended range of support for encryption. |\n| PDF/A-2 | ISO 19005-2:2011 | 2011 | An archive format for standalone PDF documents based on ISO 32000-1:2008. |\n| PDF/A-3 | ISO 19005-3:2012 | 2012 | An extension of PDF/A-2 to support file embedding. |\n| PDF v2.0 | ISO 32000-2:2017 | 2017 | Revision of ISO 32000-1:2008. |\nPiraux et al. (2018) evaluated the interoperability of fifteen implementations of QUIC finding some shortcomings in all. The tests developed by Piraux et al. have since been incorporated in the test suites of some of the implementations tested (Piraux et al., 2018).\n\nStandardisation processes can take a long time, and consequently may be seen by some as an inhibitor of innovation. De Coninck et al. (2019), for example, cite the slowness of the QUIC standardisation process as motivation for a proposed plugin mechanism to extend QUIC. They have proposed, implemented and investigated a flexible approach where applications communicating with QUIC negotiate which extensions to QUIC to use during connection set-up (De Coninck et al., 2019).\n\nStandards are also long-lived, and require review and revision in response to developments in both practice and technology. The Joint Photographic Expert Group (JPEG) have initiated a number of standardisation efforts to update the 25 year old JPEG standards for image files, including the JPEG XT project (JPEG, 2019). Richter and Clark (2018) identify how JPEG implementations differ from the standard, and the difficulties of applying the JPEG conformance testing protocol published in ISO 10918-5:2013 (ISO, 2013) to current implementations. Richter et al. identify two key issues. Firstly, the evolution of a body of practice building on the standard during the 25 years since it was made available, which motivates the standardisation review. Secondly, parts of the current standard are not used in practice, and may no longer need to be part of any revised standard (Richter and Clark, 2018).\n\nThe standardisation of HTML and CSS, and other web technologies followed a different path. Standards for both HTML and CSS have been developed by the World Wide Web Consortium (W3C) (W3C, 2019b) since the 1990s (W3C, 2019a), initially under the auspices of the IETF (Bouvier, 1995). During the browser wars (Bouvier, 1995) companies would add functionality to their browsers to extend the standard, and encourage web site developers to create content specifically for innovative features found in one browser. The process of developing websites to support variations in HTML became so onerous for developers that practitioners campaigned for Microsoft and Netscape to adhere to W3C standards (Phillips, 1998; WaSP, 2019).\n\nPrevious research on the development of PDF software in two OSS projects found developers adopted specific strategies to support interoperability (Gamalielsson and Lundell, 2013). Specifically, developers would exceed the specification, and mimic a dominant implementation so that their software complied with that implementation. In addition, the study illuminated difficulties developers had interpreting the PDF standard. One issue identified was the lack of detail in parts of the specification that made software implementation imprecise, and unreliable. Another concern expressed was that the complexity of the specification inhibited implementation (Gamalielsson and Lundell, 2013). Indeed, analyses of PDF from the perspective of creating parsers have found the task to be challenging (Bogk and Sch\u00f6pl, 2014; Endignoux et al., 2016). As part of their investigation of PDF, Endignoux et al. (2016) identify ambiguities in the file structures that were used to discover bugs in a number of PDF readers. Bogk and Sch\u00f6pl (2014) describe the experience of trying to create a formally verified parser for PDF. They advise that the creators of future file format definitions should ensure that the format is \u201c... complete, unambiguous and doesn\u2019t allow unparseable constructions.\u201d (Bogk and Sch\u00f6pl, 2014) In practice, the complexity of PDF specifications can lead to significant security vulnerabilities in software implementations (Mladenov et al., 2018a; 2018b).\n\nThe PDF/A standards (see Table 1) are used in document preservation. An area of concern is the management of documents that do not comply with the PDF/A standards. Lehtonen et al. (2018) identify the complexity of the problems faced by those handling documents, and explore mechanisms through which documents might be repaired so that they are \u201cwell-formed and valid PDF/A files.\u201d The team behind the development of veraPDF, a PDF/A validator, identify difficulties interpreting the PDF/A standard (Wilson et al., 2017) to be able to create validation tests representing a clear understanding of the standards. Additionally, Wilson et al. (2017) record the need to limit the scope of the validation tests implemented in veraPDF because of the scale of the task, particularly in the validation of normative references such as JPEG 2000. Lindlar et al. (2017) record the development of a test set of PDF documents to test the conformance of PDF files with the structural and syntactic requirements of ISO 32000-1:2008. The authors argue that a test set used to examine basic well-formedness requirements is helpful in digital preservation, as it simplifies the detection of specific problems as a precursor application of document repair techniques (Lindlar et al., 2017).\n\nIn summary, previous research shows the necessity of standardisation for software interoperability, and details approaches to standardisation. Research has also identified how practice can deviate from standards, and in the case of PDF the practical difficulties of developing software, and the challenges of creating mechanisms to evaluate standards compliance. The challenges of implementing standards have also been recorded. However, there is a lack of research that examines the nature of day-to-day practical decision-making of software developers when implementing a standard.\n\n3. Research approach\n\nWe undertake a case study (Gerring, 2017; Walsham, 2006) of a single, purposefully-sampled (Patton, 2015) community OSS project that focuses on the challenges contributors face when creating and maintaining interoperable software and how they collaborate to resolve problems.\n\n3.1. Case selection\n\nApache PDFBox was selected as a relevant subject for the case study for several reasons. Firstly, for PDFBox to have any value for users it must be able to interoperate with other software that reads and writes PDF documents. As such, it must implement sufficient of the PDF specifications and standards to be perceived as a viable solution. Secondly, the PDF specifications and standards are complex and documented as challenging to implement, with the additional requirement that implementations need to process a wide variety of conforming and non-conforming documents to emulate the functionality of a dominant implementation. Thirdly, though the software produced by the OSS project is most likely to be used in a business setting, PDFBox is an ASF project and is independent of direct company control. Consequently, contributors to PDFBox are obliged to rely on cooperation with others in the community to achieve their goals. Fourthly, the PDFBox project actively develops and maintains software, responds to reports of issues with the software, and releases revisions of the software frequently.\n\nThe scope of the investigation is the publicly documented work contributing to nine releases of Apache PDFBox between the release of v2.0.3 in September 2016 and the release of v2.0.12 in October 2018. The period investigated was specifically chosen to include the publication of the ISO 32000-2:2017 standard, also known as PDF v2.0, in August 2017.\n\n3.2. Case description\n\nThe Apache PDFBox project develops a Java library and command line tools that can create and process PDF files. The library is relatively low level and can be used to create and process PDF documents conforming to different versions of the PDF specifications.\nand ISO standards (see Table 1 for examples). In development since 2002, and an ASF governed project since 2008, PDFBox is maintained by a small group of core developers and an active community of contributors. PDFBox is a dependency of some other ASF projects, including Apache Tika (Apache Tika, 2019), and other OSS projects, including the European Union funded Digital Signature Services project (CEF Digital, 2019). PDFBox is used to parse documents in one version of the veraPDF validator (veraPDF, 2019), as well as being used in proprietary software products and services. PDFBox was also part of the software suite used by journalists to extract information from PDF files amongst the documents collectively known as the Panama Papers (Khudairi, 2017; ICJ, 2019).\n\nAt the time of the study, the most recent major revision of PDFBox, v2.0.0, had been released in March 2016 and maintenance releases have generally been made approximately every two to three months since. In addition, the project maintains an older version, v1.8, in which bugs are fixed, and releases made less often. The overwhelming majority of bug fixes for the 1.8.x series are backported to the 2.0.x series. The project is also working towards a major revision in v3.0.\n\n### 3.3. Data collection\n\nThe core data for the case study consists of the online archives of activity in the PDFBox project. Using the PDFBox website (Apache PDFBox, 2019) we identified the communication channels available for making contributions, and the resources available for users of the software and contributors (see Table 2). Three public communication channels can be used to make contributions: the Jira issue tracker, and developers and users mailing lists. In addition there is a commits mailing list that reports the commits made to the PDFBox source code repository through messages generated by the version control system. A read-only mirror of the PDFBox source code is also provided on GitHub.\n\nMailing list archives identified were downloaded from the ASF mail archives (ASF, 2019b) and the GrimoireLab Perceval component (Bitergia, 2019) was used to parse the Mbox format files and convert them into JSON format files. The JSON files were then processed using Python scripts to reconstruct the email threads and write the threads out in emacs org-mode files for analysis (org-mode is a plain text format for emacs that supports text folding and annotation). The Jira issue tracker tickets were retrieved in JSON format using the Jira REST API (Atlassian, 2019). The JSON records for each ticket were then aggregated and processed by Python scripts to create org-mode files containing the problem description and the comments on the ticket.\n\n### 3.4. Data analysis\n\nThe data gathered from the PDFBox project was analysed using the thematic analysis framework (Braun and Clarke, 2006).\n\nInitially, the first author worked systematically through all the collected data to identify the email threads and issue tracker tickets that address the topic of interoperability in any regard. The mailing list threads and issue tracker tickets cover a wide range of topics including project administration as well as help requests, and potential bug reports. Key factors considered included reference to the capabilities of PDFBox in comparison to other PDF processing software and mention of any PDF specification or standard or any of its normative references, such as font and image formats. During this phase, email threads were reconstructed where parts of conversations with the same subject line had been recorded in the archives as separate threads.3\n\nThe set of candidate email threads and issue tracker tickets were then examined in more detail to identify discussions in which decisions were made concerning the implementation of functionality related to the PDF specifications and standards, and their normative references in PDFBox and other software. Mailing list threads and issue tracker tickets where no clear decision was articulated were ignored for analytical purposes, as were discussions where it was judged there was insufficient information given for any decisions made to be clearly understood.\n\nThe conversations recorded in mailing list threads and issue tracker tickets contain the technical opinions and judgements of domain experts, including the core developers, and often contain explicit reference to PDF specifications and standards. Where there was no specific reference to a standard in a conversation, the topic of the discussion was used to determine relevance through comparison with other conversations on the topic explicitly linked to the PDF standards by contributors. At the end of the process, 111 mailing list threads and 394 issue tracker tickets had been identified for further analysis. Coding was also used at this stage to annotate the discussions, and particularly the decisions made, to help identify the nature of the problems being addressed, the relationship between the problems and the PDF standards and other PDF software, and the outcome of the decision-making process.\n\nThe corpus of 505 mailing list and issue tracker discussions was then analysed in depth by the first author to identify candidate semantic themes to describe the types of decision being made, and to identify candidate thematic factors influencing the decisions made. The coding from the previous phase supported the grouping of decision types and the development of semantic themes. Additional coding undertaken at this stage was used to identify factors influencing decisions and to develop a set of candidate thematic factors.\n\nIn the subsequent phase, all authors discussed the candidate decision types and factors alongside illustrative discussions taken from the corpus. A set of four semantic themes and seven thematic factors was agreed, and their consistency with the larger body of evidence reviewed by the first author.\n\n### 4. Findings\n\nThis section describes the semantic themes identified through thematic analysis that categorise the decisions made by contributors to PDFBox regarding maintenance of its interoperability. Each decision type is illustrated with examples. Thereafter we provide\n\n---\n\n3 Each email header contains a reference to the message it replies to. Sometimes the reference can be omitted when replying to a mailing list message.\nTable 3\nTypes of software development decisions related to the PDF specifications and standards in the Apache PDFBox project.\n\n| Decision Type | Description |\n|---------------------------------------------------|-----------------------------------------------------------------------------|\n| Improve to match de facto reference implementation| A decision taken in the context of improving or correcting PDFBox to match the de facto reference implementation. |\n| Degrade to match de facto reference implementation | A decision taken in the context of degrading the compliance of PDFBox with a PDF specification or standard so that the behaviour matches that of an Adobe implementation. |\n| Improve to match standard | A decision taken in the context of improving or correcting the behaviour of PDFBox to meet a PDF specification or standard. |\n| Scope of implementation | A decision taken about the extent of the PDFBox implementation. |\n\nTable 4\nApache PDFBox JIRA issue tracker tickets referenced in Section 4.1.\n\n| Decision type | Issue tracker ticket |\n|---------------------------------------------------|----------------------|\n| Improve to match de facto reference implementation| PDFBOX-3513 |\n| | PDFBOX-3589 |\n| | PDFBOX-3654 |\n| | PDFBOX-3687 |\n| | PDFBOX-3738 |\n| | PDFBOX-3745 |\n| | PDFBOX-3752 |\n| | PDFBOX-3781 |\n| | PDFBOX-3789 |\n| | PDFBOX-3874 |\n| | PDFBOX-3875 |\n| | PDFBOX-3913 |\n| | PDFBOX-3946 |\n| | PDFBOX-3958 |\n| Degrade to match de facto reference implementation| PDFBOX-3929 |\n| | PDFBOX-3983 |\n| Improve to Match Standard | PDFBOX-3914 |\n| | PDFBOX-3920 |\n| | PDFBOX-3992 |\n| | PDFBOX-4276 |\n| | PDFBOX-3293 |\n| | PDFBOX-4045 |\n| | PDFBOX-4189 |\n\nan account of the main factors that motivate and constrain the outcomes of the types of decision made.\n\n4.1. Decision types\n\nWe identified four major types of decision related to the implementation of the PDF specification and standards in the PDFBox project (see Table 3), each of which is described below with illustrative examples. We also provide descriptions of the thematic factors identified that, in combination, influence the decisions made.\n\n4.1.1. Improve to match de facto reference implementation\n\nMuch of the work of PDFBox contributors is focused on trying to match the behaviour of Adobe\u2019s PDF software. The PDFBox core developers and many contributors treat the Adobe PDF readers as de facto reference implementations of the PDF specifications and standards (e.g. PDFBOX-3738 and PDFBOX-3745 \u2013 PDFBox JIRA issue tracker tickets referred to in Section 4.1 are listed in Table 4), and use the maxim that PDFBox should be able to process any document the Adobe PDF readers can. As one core developer explains:\n\n\u201cThere is the PDF spec and there are real world PDFs. Not all real world PDFs are correct with regards to the spec. Acrobat, PDFBox and many other libraries try to do their best to provide workarounds for that. We typically try to match Acrobat ...\u201d (PDFBOX-3687).\n\nThe ISO 32000-2:2017 standard (ISO, 2017, pp. 18-19) identifies two classifications of PDF processing software: PDF readers and PDF writers. Accordingly, developers trying to match the Adobe implementations face two major challenges. The first is to be able to process the same input that Adobe software does. The second is to create output of similar quality to that produced by Adobe software. There are also two types of output of PDF software: the document created, and how given document is rendered on screen or in print. To \u201ctry to match Acrobat\u201d (PDFBOX-3687), documents created by PDFBox should, insofar as is possible match those output by Adobe software so that they are rendered consistently by other software, and the expectation is that PDFBox, and software created using it, should also render documents with similar quality to the Adobe implementations (e.g. PDFBOX-3589 & PDFBOX-3752).\n\nThe convention in software that reads PDF files is to apply the Robustness Principle (Allman, 2011; Postel, 1981) so that documents that are not compliant with PDF specifications and standards can be processed and rendered, insofar as is possible (e.g. PDFBOX-3789). Exactly what incorrect and malformed content should, or can, be parsed into a working document is not specified by the PDF specifications and standards. The exemplar for developers is the behaviour of the Adobe Readers, as well as the behaviour of other PDF software.\n\nPDF documents consist of four parts: a header, a body, a cross reference table, and a trailer. The header consists of the string \u201c%PDF\u2013\u201d and a version number, followed, on a second line, by a minimum of four bytes with a value of 128 or greater so that any tool trying to determine what the file contains will treat it as binary data, and not text. The trailer consists of the string \u201c%%EOF\u201d on a separate line, immediately preceded by a number on one line representing the offset of the cross-reference table and the string \u201cstartxref\u201d on the line before that (see Fig. 1). A PDF parser reads the first line of a file and then searches for the \u201c%%EOF\u201d marker and works backwards to find the cross-reference table using the offset on the preceding line, and to read the trailer that confirms the number of objects referenced in the table, and the object reference of the root object of the document tree. The parser should then be able to read all the objects in the PDF file.\n\nWhere the cross-reference table is missing or damaged, PDF parsers may, according to the ISO 32000-1:2008 standard (ISO, 2008, p. 650), try to reconstruct the table by searching for objects in the file (see Fig. 2). In practice, Adobe software appears to apply the Principle of Robustness more widely so that a wide range of problems, for example with fonts, are also tolerated by the parser.\n\n4 The PDFBox JIRA issue tracker tickets referenced have URLs of the form https://issues.apache.org/jira/browse/PDFBOX-\u2018NNNN\u2019 where \u2018NNNN\u2019 is the four digit number of the ticket. For example, PDFBOX-3738 has the URL https://issues.apache.org/jira/browse/PDFBOX-3738.\n\n5 There are also \u2018linearised\u2019 PDF files intended for network transmission where the trailer and cross-reference tables precede the body.\n\n6 The repair mechanism is why, sometimes, Adobe software applications offer the opportunity for the user to save a newly opened document.\nThe work required to resolve issues of this nature varies in scope. Sometimes the source code revision is relatively trivial; a simple change to make the parser more lenient because the document author\u2019s intention is clear. For example, PDFBOX-3874 where a small change is made to a font parser so that it will accept field names in the font metadata that are capitalised differently to the specification. Similarly, in PDFBOX-3513, the PDFBox core developers identify an error in the ISO 32000-1:2008 standard as the underlying cause of an observed problem with PDFBox. One column of a table specifies two types (a name and a dictionary) for the value of an encoding dictionary for Type 3 fonts (ISO, 2008, p 259), the next column of the table clearly specifies that the field must be a dictionary. The contributor who encountered the document, proposes a revision to the parser to accommodate the error (PDFBOX-3513). One core developer comments that \u201c...we\u2019ve never encountered a file with the problem you\u2019ve presented.\u201d Another core developer points out that there is no guidance in the specification on how to treat a Type 3 font that does not have an encoding dictionary. Instead of improvising a fallback encoding, the core developers argue that there may be a case to ignore the font specified in the document as it cannot be reliably used, and the parser is not revised given the rarity of the problem.\n\nAdobe and other PDF software sometimes exceed the specifications and standards. In PDFBOX-3654, for example, a file is found that renders in many other applications, but not in PDFBox. The problem is a font that is encoded in a hexadecimal format, and the standard is unequivocal on the subject:\n\n\u201cAlthough the encrypted portion of a standard Type 1 font may be in binary or ASCII hexadecimal format, PDF supports only the binary format.\u201d (ISO, 2017, p. 351)\n\nThe source code is revised to support the font encoding and the core developer processing the issue observes:\n\n\u201cSo the font is incorrectly stored. But obviously, Adobe supports both, so we should too.\u201d (PDFBOX-3654)\n\nIn some cases the Adobe software extends the specifications and standards through the implementation of additional functionality that reflects wider practice. Often the only documentation of the additional functionality is in the implementation, and other implementers only discover the change when differences in behaviour are reported to them. For example, a report in PDFBOX-3913 shows that Adobe software and PDF.js process and render a Japanese URI, which PDFBox can not. The ISO 32000-2:2017 standard specifies that the targets of URIs (links) should be encoded in UTF-8. In both applications the URI is encoded in UTF-16, which is necessary to represent some Japanese characters used in domain names, but exceeds the standard. Revisions are made to PDFBox (documented in PDFBOX-3913, PDFBOX-3946, and PDFBOX-3958) to support UTF-16 for URIs and implement the same functionality as both Adobe and PDF.js.\n\nPDFBox contributors also find instances where documents created by the software are not rendered as expected by Adobe\u2019s software. In these cases, typically, there is a difference in the model in documents created by PDFBox and the model that Adobe expects. In some cases a great deal of work is required to understand how Adobe and other readers interpret the PDF document. In PDFBOX-3738 work is undertaken to understand how the output of digitally signed files is interpreted by Adobe and other reader products. The acquired knowledge is then applied so that PDFBox can create documents that can be read and rendered with digital signature displayed by other PDF software. The developers also identify a related problem, documented in PDFBOX-3781, that affects documents with forms and digital signatures.\n\nMerging PDF files can be a difficult problem for implementers to solve. PDFBOX-3875 records the challenges faced when merging two documents where the internal bookmarks are structured using slightly different representations in the document model. In the merged document some of the bookmarks do not work as expected. The initial assessment by one of the core developers is that the cause is within the PDFBox source code and is \u201c\u2026probably a bug. Not the kind that will be fixed quickly ...\u201d. One approach used by the core developers to evaluate how best to solve the problem is to merge the documents using other applications, including Adobe software, and to examine the document created following the merge. Work is started to try to create a viable solution by emulating the document resulting from merging the files using Adobe software, but further problems are encountered and the work is not completed.\n\n4.1.2. Degrade to match de facto reference implementation\n\nAs noted already, developers of PDF software, including the PDFBox developers, tend to view Adobe PDF software implementations as a gold standard. However, Adobe\u2019s software developers do not always implement the PDF specifications and standards in the way that others might, and on occasions, implement solutions that can be seen as incorrect. Consequently, developers of PDF software then need to determine how they might degrade the adherence of their software to the PDF specifications and standards to match Adobe\u2019s implementations.\n\nPDFBOX-3929 begins in a discussion on the PDFBox users mailing list where a user observes that PDF documents created by PDFBox with floating point numbers used for field widget border\n\n---\n\n7 PDF.js is a widely used open source PDF reader implemented in JavaScript, see https://mozilla.github.io/pdf.js/.\nwidths, are rendered by Adobe XI and Adobe DC without a border (Users-2 and Users-3 in Table 5). The borders of other annotation types are unaffected.\n\nThe width of borders drawn around annotations, such as form fields, are defined in PDF documents in two ways: a border array holding three or four values, or in some cases a border style dictionary (an associative array) that includes a value for the width of the border in points. In both cases the value to specify the width is defined as a number. PDF specifications and standards define two numeric types integer objects and real objects. The ISO 32000 standards then say \u201c... the term number refers to an object whose type may be integer or real.\u201d ISO, 2008, p. 14; ISO, 2017, p. 24). ISO 32000-2:2017, for example, is explicit where fields are required to hold integer values, and uses the term number for other numeric fields.\n\nBoth versions of the ISO 32000 standard define the border array using the following sentence:\n\n\u201cThe array consists of three numbers defining the horizontal corner radius, the vertical corner radius, and border width, all in default user space units.\u201d (ISO, 2008, p. 384; ISO, 2017, p. 465)\n\nAccordingly, the interpretation of the standards used in PDFBox agrees with the standard; border width can be specified with a floating point number. However, the Adobe reader software expects an integer, and ignores non-integer values, such as 3.0, by treating them as having a value of zero. Consequently, the PDFBox implementation was revised so that annotations in documents created by PDFBox will be rendered with borders by Adobe DC. A bug report was also made to Adobe support, saying that the standard had been interpreted incorrectly.\n\nA closely related issue is found in a thread on the users mailing list (Users-4) where a developer reports that the Adobe reader implementations behave in an unexpected way. This time the concern is the border drawn around a URI action annotation, or a link. The border is defined in the standard as described above, but the Adobe reader implementations interpret the values 1, 2, and 3 as meaning a thin, medium and thick border respectively. The PDFBox API documentation is updated to describe how the Adobe reader implementations interpret the border width value.\n\nA contributor reports in PDFBOX-3983 that Acrobat Reader fails to display some outlines and borders where the miter limit is set to a value of zero or less. The miter limit indicates how junctions between lines should be drawn. The ISO 32000-1:2008 standard states:\n\nParameters that are numeric values, such as the current colour, line width, and miter limit, shall be forced into valid range, if necessary. (ISO, 2008, p124)\n\nThe statement was revised in ISO 32000-2:2017 by the replacement of \u201cforced\u201d with \u201cclipped\u201d (ISO, 2017, p. 157).\n\nAccordingly, one interpretation might be that a compliant PDF reader would be able to display a document correctly regardless of the value of the miter limit recorded because it would automatically correct the value. However, Adobe implementations appear not to correct the value. The user reporting the problem supplies a patch so that the miter limit in documents created by PDFBox will contain miter limit values that are positive, and the simple fix allows Adobe software to display the document. OpenPDFtoHTML, another OSS project, has also encountered the same problem and takes similar action.6\n\n4.1.3. Improve to match standard\n\nThe PDFBox implementation is also revised to meet the requirements of the PDF standards and normative references, independently of the need to match the performance of Adobe products.\n\nThe use of multi-byte representations of characters in Unicode character encodings such as UTF-16 require some careful processing by PDF parsers because some single byte values can be mis-interpreted. The single byte value 0x20 represents the space character in fonts encoded in one byte. In multi-byte character encodings the byte 0x20 may be part of a character and so should not be treated as a single byte. Two kinds of operator can be used in PDF documents to position text, one of which should be used with multi-byte font encodings so that single byte values that form part of multi-byte characters are not mis-interpreted. A patch is contributed in PDFBOX-3992 so that PDFBox fully supports the operator used to justify multi-byte encoded text to comply with the ISO 32000-1:2008 standard.\n\nThe PDF/A group of standards define an archive format for PDF. The demands of the standards are high, and compliance requires a great deal of attention to detail during document preparation. In general, the PDF/A standards constrain the types of content that can be present in compliant files, and sometimes make very precise demands on the quality of embedded resources. The veraPDF Project develops a freely available validator for PDF/A files. PDFBox also implements \u2018preflight\u2019 functionality to validate documents against the requirements of PDF/A-1b (the ISO 19005-1:2005 standard) and there are examples where the implementation is revised to match the performance of the veraPDF validator when differences are found. For example, a bug in the preflight validator is found in PDFBOX-4276 and the functionality corrected so that the incorrect output is now detected as veraPDF would. In PDFBOX-3920 a user reports that font subsets created by PDFBox do not include all the data required by the PDF/A-2 standard (ISO 19005-2:2011). The PDFBox source code is modified so that the output meets the standard.\n\nThe number of revisions to the PDF specifications and standards mean that occasionally it is found that PDFBox does not implement a particular feature or capture all the data in a PDF document. A contributor reports a problem with PDFBox where a field is ignored during parsing that leads to content being rendered that is supposed to be hidden. The user provides a patch in PDFBOX-3914\n\n---\n\n6 https://github.com/danfickle/openhtmltopdf/issues/135.\nwhich forms the basis of an update to the source code so that the field is imported and the document rendered correctly.\n\n4.1.4. Scope of implementation\n\nThe core developers also make decisions about the scope of the software implemented by the PDFBox project. The question of what functionality forms the scope of the PDFBox implementation arises in some bug reports and feature requests, and has multiple dimensions.\n\nPDFBox is not intended to be a comprehensive solution for creating, processing or rendering PDF documents. The project charter, or mission statement says:\n\n\u201cThe Apache PDFBox library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command-line utilities. Apache PDFBox is published under the Apache License v2.0.\u201d (Apache PDFBox, 2019)\n\nPDFBox relies on some external libraries to provide functionality, especially in the area of image processing. There is no need for the PDFBox project to reimplement the wheel, particularly in technically demanding domains. A further difficulty is that image processing provision within the core Java libraries is incomplete, and varies between Java versions. Some functionality, such as the JPEG 2000 codec, is no longer maintained and is difficult for OSS implementers to adopt because of the licence used and potential patent issues (discussed further in Section 4.2.6). Java provision for image processing is changing and, with Java v9, functionality is gradually being returned to the core libraries. However, the JPEG 2000 codec remains outside the main Java libraries. Further, PDFBox core developers often recommend the use of the Twelve Monkeys plugin9 for image processing, in particular because it processes CMYK images that PDFBox does not.\n\nSome areas of work are outside the current scope of PDFBox, including the implementation of rendering for complex scripts. There is some provision, and some developers have contributed code for non-European languages where they have expertise (for example Users-5). In some cases the layout of the languages is sufficiently close to Latin scripts that there is no need for additional provision, if the fonts are correct as shown in PDFBOX-3293. However, for many languages including Arabic and those from the Indian subcontinent there is a need to implement code to position the glyphs using GSUB and GPOS tables. In PDFBOX-4189 a user provides a lot of the functionality to support GSUB tables for Bengali. The complexity of the task is clear from the discussions reviewing and accepting the source code.\n\nDecisions are also made about the cause of observations and whether what is observed is the result of a problem with PDFBox. Where the issue lies with PDFBox, decisions are then made about resolving the problem. Sometimes the erroneous observation results from other software. A user reports a difference between the assessments by Adobe preflight and PDFBox concerning a document\u2019s compliance with the PDF/A-1b standard in PDFBOX-4045. Adobe XI identifies inconsistencies in the glyph widths for one font in the document. After investigation the core developers determine that there is no error in PDFBox and that Adobe X agrees that the document is compliant. Given the inconsistent assessments made by Adobe X and XI, and that inspection of the font does not show the issue reported by Adobe XI, the PDFBox core developers conclude there is a problem with the implementation of preflight in the particular version of Adobe XI used.\n\n9 https://github.com/haraldk/TwelveMonkeys.\n\n4.2. Factors influencing decision-making\n\nCommon to the decision types observed is a set of considerations or factors that influence the outcome of the decision-making process (see Table 6).\n\n| Factor | Description |\n|-------------------------|-----------------------------------------------------------------------------|\n| Workforce | The availability of contributors to do work. |\n| Maintenance Risk | The maintenance burden for the project of a feature implementation. |\n| Expertise | The collective expertise of the contributors to the project. |\n| Sustainable Solution | The long-term viability of a technical solution. |\n| Capability | The ability to make relevant and meaningful changes in a given context. |\n| Intellectual Property Rights | Matters pertaining to copyright, patents and licensing. |\n| Java Interoperability | The consequences for interoperability of revisions to Java. |\n\n4.2.1. Workforce\n\nCompanies choose to use the PDFBox software and, where appropriate for their needs, contribute to its improvement through the work of their developers. As noted, the core developers of PDFBox are few in number and are, as they emphasise, not paid for their work on PDFBox:\n\n\u201cThe project is a volunteer effort and were always looking for interested people to help us improve PDFBox. There are a multitude of ways that you can help us depending on your skills.\u201d (Apache PDFBox, 2019)\n\nWith limited time available to them (Targett, 2019), the PDFBox core developers concentrate their efforts (Khudairi, 2019) in areas of the software where work is a priority, unless other developers in the community are able to contribute.\n\nThe example given previously of work on a solution for a document merging problem (PDFBOX-3875)10 that halts may be explained by the limited workforce being focused on other, more achievable tasks, as illustrated by a core developers\u2019 comment on another task:\n\n\u201cI had hoped to implement that but given current commitments I have it is unlikely that I\u2019m able to do it in the short term (I\u2019m trying to concentrate on resolving AcroForms related stuff in my spare time for the moment[1]).\u201d (PDFBOX-3550)\n\nAnother example of the influence of the available workforce on decision making can be found in PDFBOX-3875 where a developer working for a company wants a problem resolved. The problem is challenging and will take time to understand and resolve. The developer reporting the problem is given three choices: to adopt and use another OSS application, and, implicitly, to buy a licence for Adobe professional, or to contribute the fix themselves either directly or by commissioning other developers to do the work.\n\n4.2.2. Maintenance risk\n\nThe notion of a maintenance risk can be related to the factors of expertise and workforce. Core developers will sometimes express or imply a concern that they are unwilling to accept a solution. For example, PDFBOX-3962 where a user proposes a solution that repairs the unicode mappings in one PDF document so that it can be\n\n10 Issue tracker tickets referenced in Section 4.2 are given in Table 7.\nrendered. The core developers identify that the solution resolves a special case, and that further work would be required to develop a viable solution for the Java 9 libraries. Another concern articulated in some requests for support for complex scripts is that the core developers do not have the skills to maintain the functionality. A lengthy discussion of the issue can be found in PDFBOX-3550 where the core developers identify some central challenges to creating a solution. The main concern in both cases is that by providing additional functionality that cannot be maintained or is a challenge to maintain, either in terms of the effort required or the necessary expertise, there is a risk to the utility of the software, and, perhaps, the viability of the project.\n\n4.2.3. Expertise\n\nThe implementation of PDF software requires expertise in a wide range of areas in addition to PDF itself. Limitations to the available expertise help determine what work can be done by contributors. One implication, already noted, is the reluctance to maintain source code in areas where there is no or limited expertise amongst the core developers. Another is that some areas of functionality cannot be developed. For example, a user asks about compressing CMYK JPEG images in PDFBOX-3844. The core developer responds by saying:\n\n\u201cThere is no JPEG compression from CMYK BufferedImage objects out of the box, i.e. Java ImageIO doesn\u2019t support it, and we don\u2019t have the skills, so that I\u2019ll have to close as \u201cwon\u2019t fix\u201d this time.\u201d (PDFBOX-3844)\n\nThe alternative suggested in PDFBOX-3844 is to investigate the Twelve Monkeys project that builds on the Java ImageIO functionality.\n\nThere is also a great deal of expertise within the PDFBox community which can enable the implementation of solutions. In PDFBOX-4095 one contributor provides a proposed solution to a challenging problem. After some work evaluating the proposed change, which isn\u2019t going well, another contributor suggests a simple revision that resolves the problems. Similarly a complex image rendering problem is solved with the help of advice from a contributor in PDFBOX-4267, and another contributor implements code to process YCbCr CMYK JPEG images in PDFBOX-4024.\n\nExpertise alone, however, is not sufficient to provide a solution to a problem in all cases. The discussion in PDFBOX-4189 shows there is considerable expertise within the user community and the core developers about fonts and how to render complex scripts. Key factors that have prevented the work being done previously have been not only a shortage of available workforce, but also a lack of expertise in the target language that would provide sufficient understanding to distinguish between good and bad solutions:\n\n\u201cMany complex scripts (such as Arabic) require shaping engines which require deep knowledge of the languages in order to follow the rules in the OpenType tables.\u201d (PDFBOX-3550)\n\n4.2.4. Sustainable solution\n\nThere are often implementation choices to be made when resolving a problem. The better long-term solution is more viable than the short-term fix, or workaround. In PDFBOX-3300 concerns are reported about the way that a font subset has been created prior to embedding it in a document. A specific solution is proposed that provides a way of resolving the problem. Another developer identifies that the optimal solution is to resolve some problems in the CMap\\textsuperscript{11} parser. It is a more sustainable solution than a patch to provide a specific workaround. In this case the developers are able to create a generic solution that better addresses the font standards, and thereby the PDF standards, and provides a longer-lived solution.\n\n4.2.5. Capability\n\nA key factor in decisions concerns whether the project is able to correct the problem that is causing the observed behaviour. The examples given in Section 4.1.2 where the PDFBox implementation was degraded from meeting the standard to match the behaviour of Adobe\u2019s software illustrate one aspect of capability as a factor. In those cases the \u2018incorrect\u2019 implementation could not be revised, and only a revision to PDFBox could ensure documents created would be rendered as expected by Adobe\u2019s implementations. In other cases bugs are found in external libraries or infrastructure that have an impact on PDFBox. Often a workaround will be found, or an alternative library recommended. For example, PDFBOX-3641 describes a situation in which PDFBox uses a core Java library in a way that triggers a bug in the Java implementation. The code in PDFBox is revised to prevent the bug being triggered. The Java bug is also reported\\textsuperscript{12}.\n\n4.2.6. Intellectual property rights\n\nPDF documents can include technologies and artifacts where use is constrained by copyright, patents or licences. In addition, PDFBox is implemented in Java which during its lifetime has moved from closed source, to largely open source, to some variants (e.g. OpenJDK and derivatives like Amazon Corretto) that are entirely open source. An implementation of the JPEG 2000 codec was included in extensions to the Java libraries. During Sun Microsystems\u2019 process to make Java open source the codec along with other image codecs was released as a separate library known as ImageIO. The licence used for the implementation of the JPEG 2000 codec is not an Open Software Initiative (OSI) approved open source licence and some consider the licence used is incompatible with OSS licences such as the GPL v3 and the Apache Licence v2.0.\\textsuperscript{13} In addition there are concerns amongst OSS developers about the potential of patent claims related to JPEG 2000, though the concerns are diminishing with the passage of time. Most of the image codecs in the ImageIO library have been reincorporated into the Java libraries in OpenJDK since v9, but the JPEG 2000 codec has not. Consequently, JPEG 2000 support in PDFBox, where it is required by users, relies on the jai-imageio\\textsuperscript{14} implementation of the codec.\n\n\\textsuperscript{11} A CMap is a table in a font file that maps character encodings to the glyphs that represent them.\n\n\\textsuperscript{12} https://bugs.openjdk.java.net/browse/JDK-8175984\n\n\\textsuperscript{13} For example the opinion expressed at: https://github.com/jai-imageio/jai-imageio-jpeg2000.\n\n\\textsuperscript{14} https://github.com/jai-imageio/jai-imageio-jpeg2000.\nwhich is no longer maintained. A user reports using the OpenJPEG implementation of JPEG 2000 in PDFBOX-4320. However, OpenJPEG is implemented in C and can only be used as native code and which may not be suitable for some deployment contexts. The development of a replacement OSS JPEG 2000 codec is inhibited by the resources, including expertise and finance, required to implement a large and complex standard.\\footnote{JPEG 2000 is defined in ISO/IEC 15444 which consists of 14 parts (see Lundell et al., 2018).}\n\nThe ISO 19005-1:2005 standard (ISO, 2005, p. 11) for archival PDF documents mandates the embedding of fonts, including the standard 14 fonts,\\footnote{PDF specifications require 14 fonts to be present on systems that render documents, e.g. ISO 32000-1:2008 (ISO, 2008, p. 256).} or substitute fonts, in files so that the document contains all the resources required to render it. The requirement is stated as: \u201cOnly fonts that are legally embeddable in a file for unlimited, universal rendering shall be used.\u201d (ISO, 2005, p. 10). The requirement can be problematic because many fonts have licences that do not permit redistribution. The matter is discussed in PDFBOX-3618. The legality of the embedded fonts is the responsibility of the document creator. Both the PDF/A-1 and PDF/A-2 standards include a note that clarifies the need for the legal use of any font to be clearly and verifiably stated:\n\n\u201cThis part of ISO 19005 precludes the embedding of font programs whose legality depends upon special agreement with the copyright holder. Such an allowance places unacceptable burdens on an archive to verify the existence, validity and longevity of such claims.\u201d (ISO, 2005, p. 11; ISO, 2011, p. 15).\n\n4.2.7. Java interoperability\n\nIn addition there are a set of problems concerning interoperability with Java that are an influence on the solutions implemented in PDFBox. Some are related to the PDF standards where Java is used to provide support such as image processing required by the standards. An example is found in PDFBOX-3549 where Java versions have differing capability to process ICC colour spaces, and in some versions there are bugs that affect the handling of ICC colour spaces. During the period of PDFBox activity investigated three new major versions of Java were released, and many revisions made to each version. There is also some evidence in the mailing lists and Jira tickets that some users are still using Java 5, which was already obsolete at the start of the period investigated.\n\n4.3. Summary\n\nThrough analysis of two years of activity in the PDFBox project related to implementation of the PDF specifications and standards, we have identified four decision types related to development of the project software and seven factors that influence those decisions. The four decision types are related to adapting the software to emulate the behaviour of Adobe\u2019s PDF software, implementing the PDF standards, and the scope of the PDFBox implementation. The seven factors act in combination to facilitate and constrain development activity, especially the interplay between expertise and workforce.\n\n5. Analysis\n\nMuch of the work of PDFBox contributors consists of trying to match the implementation of Adobe PDF reader software. The reasons for matching Adobe implementations are mostly clear, yet trying to emulate Adobe\u2019s software is clearly challenging, and solutions, including validators, that might reduce the extent of the challenges, and the risks, are themselves challenging to create.\n\n5.1. The challenges of developing PDF parsers\n\nThe PDF specifications and standards specify that PDF software may try to reconstruct files where the cross reference table is incorrect or has been omitted. In practice the Principle of Robustness is applied in Adobe\u2019s PDF software so that PDF files that are not well-formed can often be rendered. The developers of other PDF applications are obliged to follow Adobe\u2019s lead. If the developers of non-Adobe PDF software did not implement parsers that behaved similarly to Adobe\u2019s then their products would quickly become irrelevant because PDF users often believe that because documents can be read and rendered by Adobe software that they must meet the standard (Amiouny, 2016; Lehtonen et al., 2018). The extent to which PDF applications and libraries are expected to tolerate errors in documents is documented by Adobe\u2019s software, which creates a number of challenges for developers of PDF software.\n\nFirstly, non-Adobe developers are left with the time-consuming puzzle of trying to match the Adobe implementations. Indeed, the puzzle includes an element of chance because differences in performance are discovered when a PDF document including a triggering problem is processed. Secondly, there are clearly security concerns in this approach. Parsing is arguably one of the more challenging software engineering tasks. In the case of PDF, the core specifications and standards are extensive and complex, and include a large number of normative references for component file and media types, all of which need to be parsed by either a PDF implementation or its dependencies. PDFBox has been the subject of Common Vulnerabilities and Exposures (CVE) notices related to parser implementation\\footnote{For example CVE-2018-8036 and CVE-2018-117979.}, as have other PDF software implementations. The core developers are therefore making decisions about security as part of those around the viability of the software when trying to match the behaviour of Adobe\u2019s software.\n\nSome practitioners argue that a small revision made in the ISO 32000-2:2017 standard concerning the structure of the file that more precisely defines the relationship between the header and the end of file marker largely put an end to the need to apply the Principle of Robustness in PDF parsing (Amiouny, 2017). However, though the changes in the standard are important and may ease some of the burden on developers, we do not share the optimism because the changes only apply to structure of documents that are or claim to be PDF v2.0 compliant. Of course, there remain in circulation all the documents created during some 25 years of PDF usage, as well as those documents that will continue to be created which are compliant with earlier specifications and standards. Further, the Principle of Robustness is applied to tolerate non-conformance with normative standards of PDF, such as fonts and images, as well as minor PDF implementation errors. Given the history of malformed PDF files and the challenges of standards compliance, the fact that a document claims to be PDF v2.0 and complies with the structural requirements of ISO 32000-2:2017 does not guarantee that either the document or its components comply with the standard. Consequently, the need for tolerant parsing remains.\n\nOne improvement might be the creation of reference implementations and validation tools; practices that have been adopted in the development of open standards, for example in the IoT domain as noted in Section 2.3 (e.g. Watteyne et al., 2016). Validation tools for fonts could help ensure that font creators build font files that contain sufficient, accurate information for other software to use the font file, and that implementers of font parsers have a means by which to evaluate their software. Further, validation tools for PDF documents and a reference implementation for PDF would help the developers of PDF software create more interoperable ap-\nlications, with less effort and, possibly, reduce the security risks arising from the need to parse malformed documents. However, in practice PDF validators are difficult and expensive to implement. The veraPDF (veraPDF, 2019) PDF/A validator, for example, was created during a European Union funded project, and the PDFTools validator is proprietary licenced software.\\(^\\text{19}\\) The problem remains, also, that solutions such as validators are forward looking, and can not address the challenge of processing non-compliant PDF files created during the last 25 years that still need to be read. There is, though, a case for introducing validators and reference implementations to help ensure that PDF files created in the future pose fewer problems for software developers (Lundell and Gamalielsson, 2018). Furthermore, tools such as validators provide a reference point against which to try to improve the quality of existing documents, exemplified by the work of Lehtonen et al. (2018) with applications in PDF file preservation.\n\n5.2. Practice vs standard\n\nOther challenges for PDFBox contributors arise from the development of practice, particularly by Adobe, and where that moves away from the standards. PDFBOX-3913 records the discovery that Adobe\u2019s PDF software and PDF.js exceed the ISO 32000-1:2008 standard by implementing UTF-16 encoding for destination URIs in links. The bug report dates from August 2017 and is contemporary with the publication of ISO 32000-2:2017, which specifies the use of UTF-8 encoding (ISO, 2017, p. 515). Given the use of UTF-16 encoded URIs, which have been part of HTML 5 since 2011,\\(^\\text{20}\\) it is outwardly reasonable for Adobe and others to follow practice. However, it remains an open question why UTF-16 encoding for URIs was not part of the ISO 32000-2:2017 standard.\n\nA further issue found in some PDFBox Jira issue tickets is a grey area between the standard and how a document is presented. The PDF specifications and standards apply to the quality of the document, and the manner in which some parts of the document are to be rendered (for example character spacing). However, the standard does not specify how software might render all of the document. The examples given above to illustrate degradation of compliance with the standard to match Adobe\u2019s implementation are of particular interest. The ISO 32000-1:2008 and ISO 32000-2:2017 standards are clear on how the value of the border width should be represented in a compliant PDF document. As the PDFBox core developers identified, the representation of the values of border widths within the document does not comply with the PDF specifications and standards because valid non-integer values are not accepted by Adobe software. However, the presentation on screen by Adobe software of border widths defined in the document is an interpretation of the values in the document, and one that may not need to be followed slavishly.\n\n5.3. Project sustainability\n\nThe PDFBox core developers generally act to improve the functionality of the project software. However, there are times when their actions appear to be constrained by the long-term interests of the project. Some decisions, for example around the support for complex scripts and graphics processing, have ready explanations in that the core developers do not always have the necessary skills, or time, to implement the required solutions. There are also some activities where there may not be a clear decision stated, but the core developers, and some other contributors, do not complete tasks because they have run out of time, or have other, higher priority, tasks to attend to. It may be inferred that the developers are acting in the long-term interests of the project to create software that works and can be maintained. The concern being that if the project contributors overreach their collective abilities and their capacity to develop and maintain good quality software, then there is a risk the project may cease to be viable. There are parallels to be drawn between the decision-making of the core developers where they reflect their capacity to make and maintain specific changes and the decisions made within a business to maintain itself as a going concern. Implicit is the idea that the PDFBox software remains marketable, i.e. that the software is sufficiently compliant with the PDF specifications and standards that it is useful to many users, and the project will therefore continue to attract users and contributors without the need to take risks by making unsustainable changes.\n\nIt should be recognised that this observed process of self-regulation is precisely that. There is no company or group of companies driving the development of PDFBox and making strategic decisions. There are no dedicated managers making strategic decisions. Instead, what appear to be sensible, level-headed strategic decisions that might be made by a business are being made in the small by a small collective of individuals and company developers collaborating on the development and maintenance of PDFBox.\n\n5.4. Limitations\n\nThe case study reported in this article describes and analyses the activity of practitioners collaborating in an OSS community to develop software that can create and process PDF documents. We acknowledge the limitations to the transferability of our findings that arise from the nature of the study. However, we conjecture that the findings may be representative of the challenges faced and decision types made in other OSS projects and, perhaps businesses, implementing standards-based interoperable software, in particular where a dominant implementation contributes to the discourse on the meaning of interoperability. Further, the factors informing the decisions made relate to technical and resource concerns that appear to be relevant for other businesses and organisations.\n\n6. Conclusions\n\nThe study reports findings from an investigation of the practical decisions concerning interoperability made during a two year period by contributors to a community open source software project (Apache PDFBox). The PDFBox project develops and maintains software that can be used to create and process documents that conform to multiple PDF specifications, some of which have been published as ISO standards. Four types of decision made by contributors to maintain the interoperability of the PDFBox software were identified through thematic analysis. Decisions on software interoperability concern compliance with the PDF specifications and ISO standards, and to match or mimic the behaviour of the de facto reference implementation, where that is unrelated to the standards or in conflict with them. In conjunction, contributors also make decisions about the scope of the PDFBox implementation. Contributors to the PDFBox project are able to deliver high quality software through a careful, and at times, conservative, decision-making process that allows an often agile response to the discovery of problems with the project\u2019s software and to changes in the dominant proprietary implementation. At the same time, the decisions made are informed by factors including resource and technical considerations which contribute towards the longer term viability of the project and the software created.\n\n\\(^{19}\\) PDFTools 3-Heights Validator: https://www.pdf-tools.com/pdf20/en/products/pdf-converter-validation/pdf-validator/.\n\n\\(^{20}\\) https://www.w3.org/TR/2011/WD-html5-20110525/urls.html.\nIn summary, the study makes the following contributions to the existing body of knowledge in this area:\n\n- A rich and detailed account of types of decisions made within a community OSS project to maintain software interoperability;\n- An account of technical and non-technical factors that motivate and constrain software development activity in the project and support project sustainability.\n\nThis study provides a rich illustration and analysis of the challenges faced by contributors to a community OSS project to implement and maintain interoperable, standards-based software. The study has shown how the contributors to PDFBox are able to meet challenges arising from the demands of the technical specifications and standards, and the performance of a de facto reference implementation. The study also finds that through awareness of the resources available to the project, the project is able to maintain interoperable software of continuing technical relevance. A topic for future research is to understand the extent to which the challenges and the decision-types identified, and the factors influencing those decisions are representative of those faced by other organisations \u2014 businesses and OSS projects \u2014 developing standards-based implementations.\n\nDeclaration of competing interest\n\nNone.\n\nAcknowledgements\n\nThis research has been financially supported by the Swedish Knowledge Foundation (KK-stiftelsen) and participating partner organisations in the LIM-IT project. The authors are grateful for the stimulating collaboration and support from colleagues and partner organisations.\n\nReferences\n\nAhlgren, B., Hidell, M., Ngi, E.C.H., 2016. Internet of things for smart cities: interoperability and open data. IEEE Internet Comput. 20, 52\u201356. doi:10.1109/MIC.2016.124.\n\nAllman, E., 2011. The robustness principle reconsidered. Commun. ACM 54, 40\u201345. doi:10.1145/1978542.1978557.\n\nAmiouny, D., 2016. Buggy PDF Files, Should We Try to Fix Them?. Amyuni Technologies Inc. http://blog.amyuni.com/?p=1627. Accessed: 2019-05-15.\n\nAmiouny, D., 2017. PDF 2.0 and the Future of PDF: Takeways from PDF Days Europe 2017. Amyuni Technologies Inc. http://blog.amyuni.com/?p=1702. Accessed: 2019-05-14.\n\nApache PDFBox, 2019. Apache PDFBox: a Java PDF Library. The Apache Software Foundation. https://pdfbox.apache.org/. Accessed: 2019-09-17.\n\nApache Tika, 2019. Apache Tika \u2014 a Content Analysis Toolkit. Apache Software Foundation. https://tika.apache.org/. Accessed: 2019-06-05.\n\nASF, 2019. The Apache Software Foundation. The Apache Software Foundation. http://www.apache.org/. Accessed: 2019-06-05.\n\nASF, 2019. Apache Software Foundation Public Mailing List Archives. Apache Software Foundation. http://mail-archives.apache.org/. Accessed: 2019-06-05.\n\nAtlassian, 2019. Jira REST APIs. Atlassian. https://developer.atlassian.com/jira/devnet/jira-apis/jira-rest-apis. Accessed: 2019-04-15.\n\nBitergia, 2019. GrimoireLab. Bitergia. https://chaoss.github.io/grimoirelab/. Accessed: 2019-08-03.\n\nBlack Duck, 2019. Apache PDFBox. Black Duck Software Inc. https://www.openhub.net/p/pdfbox/. Accessed: 2019-03-08.\n\nBogk, A., Sch\u00f6pl, M., 2014. The pitfalls of protocol design: attempting to write a formally verified PDF parser. In: 2014 IEEE Security and Privacy Workshops, pp. 198\u2013203. doi:10.1109/SPW.2014.36.\n\nBouvier, D.J., 1995. Versions and standards of HTML. SIGAPP Appl. Comput. Rev. 3, 9\u201315. doi:10.1145/238228.238232.\n\nBradner, S., 1996. The internet standards process \u2014 revision 3. Internet Engineering Task Force. https://www.rfc-editor.org/rfc/rfc2206.html. Accessed: 2019-09-19.\n\nBradner, S., 1999. The internet engineering task force. In: DiBona, C., Ockman, S., Stone, M. (Eds.), OpenSources: Voices from the Open Source Revolution. O'Reilly & Associates, pp. 28\u201330.\n\nBraun, V., Clarke, V., 2006. Using thematic analysis in psychology. Qual. Res. Psychol. 3, 77\u2013101. doi:10.1177/1478088706063003.\n\nButler, S., Gamalielsson, J., Lundell, B., Brax, C., Sj\u00f6berg, J., Mattsson, A., Gustavsson, T., Feist, J., L\u00f6nnroth, E., 2019. On company contributions to community OSS projects. IEEE Trans. Softw. Eng. (early access) doi:10.1109/TSE.2019.2910305, 1\u201311.\n\nCEF Digital, 2019. Start Using Digital Signature Services (DSS). CEF Digital. https://ec.europa.eu/cefdigital/wiki/pages/viewpage.action?pageId=77177034. Accessed: 2019-04-29.\n\nDavies, E.B., Hoffmann, J., 2004. IETF Problem Resolution Process. Internet Engineering Task Force. https://www.rfc-editor.org/rfc/rfc3844.html. Accessed: 2019-09-19.\n\nDe Coninck, Q., Michel, F., Piraux, M., Rochet, F., Given-Wilson, T., Legay, A., Perret, P., Ronaventure, O., 2019. Pluginizing QUIC. In: Proceedings of the ACM Special Interest Group on Data Communication. ACM, New York, NY, USA, pp. 59\u201374. doi:10.1145/3341302.3342078.\n\nEclipse Foundation, 2019. Californium (Cf) CoAP framework. Eclipse Foundation. https://www.eclipse.org/cf/. Accessed: 2019-10-03.\n\nEclipse Foundation, 2019. Eclipse Leshan. The Eclipse Foundation. https://www.eclipse.org/leshan/. Accessed: 2019-10-03.\n\nEclipse Foundation, 2019. Eclipse Wakaama. The Eclipse Foundation. https://www.eclipse.org/wakaama/. Accessed: 2019-10-03.\n\nEclipse IoT Working Group, 2019. Open Source for IoT. Eclipse IoT Working Group. https://www.eclipse.org/iot/. Accessed: 2018-08-29.\n\nEgidi, T.M., 2007. Standard-compliant, but incompatible?. Comput. Standards Interfaces 29, 605\u2013613. doi:10.1016/j.csi.2007.04.020.\n\nEndignonx, G., Levillain, O., Migeon, J.Y., 2016. Caradoc: a pragmatic approach to PDF parsing and validation. In: 2016 IEEE Security and Privacy Workshops (SPW), pp. 125\u2013139. doi:10.1109/SPW.2016.39.\n\nFitzgerald, B., 2006. The transformation of open source software. Manage. Inf. Syst. Q. 30, 587\u2013598.\n\nGamalielsson, J., Lundell, B., 2013. Experiences from implementing PDF in open source: Challenges and opportunities for standardisation processes. In: Proceedings of the 8th International Conference on Standardization and Innovation in Information Technology (SITI) 2013. pp. 1\u201311. doi:10.1109/SITI.2013.6774572.\n\nGerring, J., 2017. Case Study Research: Principles and Practices, second ed. Cambridge University Press, Cambridge, UK.\n\nICJ, 2019. The Panama Papers: Exposing the Rogue Offshore Finance Industry. https://www.icj.org/investigations/panama-papers/, Accessed: 2019-05-29.\n\nIETF, 2019. Internet Engineering Task Force. Internet Engineering Task Force. https://www.ietf.org/. Accessed: 2019-09-27.\n\nIETF, 2019. QUIC (quic) \u2014 about. Internet Engineering Task Force. https://datatracker. ietf.org/wg/quic/about/. Accessed: 2019-09-24.\n\nIETF, 2019. QUIC (quic) \u2014 documents. Internet Engineering Task Force. https://datatracker. ietf.org/wg/quic/documents/. Accessed: 2019-09-24.\n\nISO, 2005. Document management \u2014 Electronic Document File Format for Long-Term Preservation \u2014 Part 1: Use of PDF 1.4 (PDF/A-1) (ISO 19005-1:2005), first ed. International Organization for Standardisation, Geneva, Switzerland.\n\nISO, 2008. Document Management \u2014 Portable Document Format \u2014 Part 1: PDF 1.7 (ISO 32000-1:2008), first ed. International Organization for Standardisation, Geneva, Switzerland.\n\nISO, 2012. Document Management \u2014 Electronic Document File Format for Long-Term Preservation \u2014 Part 2: Use of ISO 32000-1 (PDF/A-2) (ISO 19005-2:2011), first ed. International Organization for Standardisation, Geneva, Switzerland.\n\nISO, 2013. Digital Compression and Coding of Continuous-Tone Still Images: JPEG File Interchange Format (JFIF) (ISO/IEC 10918-5:2013), first ed. International Organization for Standardisation, Geneva, Switzerland.\n\nISO, 2017. Document Management \u2014 Portable document format \u2014 Part 2: PDF 2.0 (ISO 32000-2:2017), first ed. International Organization for Standardisation, Geneva, Switzerland.\n\nJEGF, 2019. Overview of JPEG XT. International Standards Organisation. https://jegf.org/jegxt/. Accessed: 2019-04-01.\n\nKelly, M., Nelson, M.L., W\u00e9gie, M.C., 2014. The archival acid test: Evaluating archive performance on advanced HTML and JavaScript. In: IEEE/ACM Joint Conference on Digital Libraries, pp. 25\u201328. doi:10.1109/ICDL.2014.6970146.\n\nKhudairi, S., 2017. The Apache Software Foundation Recognizes Apache Innovations to the Pulitzer Prize-winning Panama Papers investigation. Apache Software Foundation, https://blogs.apache.org/foundation/entry/the-apache-software-foundation-recognizes. Accessed: 2019-02-14.\n\nKhudairi, S., 2019. Apache in 2018 \u2014 by the Digits. Apache Software Foundation. https://blogs.apache.org/foundation/entry/apache-in-2018-by-the-. Accessed: 2019-01-02.\n\nKo, J., Eriksson, J., Tsiftes, N., Dawson-Haggerty, S., Vasseur, J., Durvy, M., Terzis, A., Dunkels, A., Culler, D., 2011. Industry: Beyond Interoperability: Pushing the Performance of Sensor Network IP Stacks. In: Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems. ACM, New York, NY, USA, pp. 1\u201311. doi:10.1145/2070942.2070944.\n\nLehmkuhler, A., 2010. Apache PDFBox \u2014 Working with PDFs for Dummies. The Apache Software Foundation. https://people.apache.org/lehman/apachecon/ ApacheConPDFBox.pdf. Accessed: 2019-06-04.\n\nLehtonen, J., Helin, H., Kylander, J., Koivunen, K., 2018. PDF mayhem: is broken really broken? In: Proceedings of the 15th International Conference on Digital Preservation (IPRES 2018) doi:10.17615/IRRES-1228649.\n\nLindlar, M., Tunnat, Y., Wilson, C., 2017. A test-set for well-formedness validation in JHOVE \u2014 the good, the bad and the ugly. In: Proceedings of the 15th International Conference on Digital Preservation (IPRES 2017) doi:10.5281/zenodo.1228649.\nLundell, B., 2011. e-Governance in public sector ICT procurement: what is shaping practice in Sweden? Eur. J. ePractice 12, 66\u201378. https://joinup.ec.europa.eu/sites/default/files/document/2014-06/ePractice220Journal%20Vol.%205%202012-March_April%202011.pdf.\n\nLundell, B., Gamalielsson, J., 2017. On the potential for improved standardisation through use of open source work practices in different standardisation organisations: how can open source projects contribute to development of IT-standards? In: Jakobs, K. (Ed.) Digitalisation: Challenge and Opportunity for Standardisation. Proceedings of the 22nd EURAS Annual Standardisation Conference, EURAS Contributions to Standardisation Research, Vol. 12. Verlag Mainz, Aachen, pp. 137\u2013155.\n\nLundell, B., Gamalielsson, J., 2018. Sustainable digitalisation through different dimensions of openness: How can lock-in, interoperability, and long-term maintenance of IT systems be addressed? In: Proceedings of OpenSym \u201918. ACM, New York, NY, USA doi: 10.1145/3233391.3235527.\n\nLundell, B., Gamalielsson, J., Katz, A., 2018. On challenges for implementing ISO standards in software: Can both open and closed standards be implemented in open source software? In: Jakobs, K. (Ed.) Corporate and Global Standardization Initiatives in Contemporary Society. IGI Global, Hershey, PA, USA, pp. 219\u2013251. doi: 10.4018/978-1-5225-5320-5.\n\nLundell, B., Gamalielsson, J., Tengblad, S., Yousefi, B.H., Fischer, T., Johansson, G., R\u00f6dung, B., Mattsson, A., Oppmark, J., Gustavsson, T., Feist, J., Landemo, S., L\u00f6nnroth, E., 2017. Addressing lock-in, interoperability, and long-term maintenance challenges through open source: How can companies strategically use open source? In: Open Source Systems: Towards Robust Practices \u2013 Proceedings of the 13th IFIP WG 2.13 International Conference on Open Source Systems, OSS 2017. Springer, pp. 80\u201388. doi: 10.1007/978-3-319-57735-7_9.\n\nMladenov, V., Mainka, C., Meyer zu Selhausen, K., Grothe, M., Schwenk, J., 2018a. 1 Trillion dollar refund \u2014 how to spoof PDF signatures. https://www.pdf-insecurity.org/download/paper.pdf. Accessed: 2019-05-09.\n\nMladenov, V., Mainka, C., Meyer zu Selhausen, K., Grothe, M., Schwenk, J., 2018b. How to break PDF signatures. https://pdf-insecurity.org/. Accessed: 2019-05-14.\n\nNikolich, P., I. C. L., Korhonen, J., Marks, R., Tye, B., Li, G., Ni, J., Zhang, S., 2017. Standards for 5G and beyond: their use cases and applications. https://futurenetworks.ieee.org/tech-focus/june-2017/standards-for-5g-and-beyond. Accessed: 2019-10-03.\n\nOMA, 2019. OMA SpecWorks. Open Mobile Alliance. https://www.omaspecworks.org/. Accessed: 2019-10-03.\n\nPatton, M.Q., 2015. Qualitative Research and Evaluation Methods, fourth ed. Sage Publications Inc, Thousand Oaks, California, USA.\n\nPhillips, B., 1998. Designers: the browser war casualties. Computer 31, 14\u201316. doi: 10.1109/2.722269.\n\nPhipps, S., 2019. Open Source and FRAND: Why Legal Issues are the Wrong Lens. Open Forum Academy. http://www.openforumeuropa.org/wp-content/uploads/2019/03/OFA_-_Opinion_Paper_-_Simon_Phipps_-_OSS_and_FRAND.pdf. Accessed: 2019-10-03.\n\nPiraux, M., De Coninck, Q., Bonaventure, O., 2018. Observing the evolution of QUIC implementations. In: Proceedings of the Workshop on the Evolution, Performance, and Interoperability of QUIC. ACM, New York, NY, USA, pp. 8\u201314. doi: 10.1145/3248850.3248487.\n\nPostel, J., 1981. RFC 793, Transmission Control Protocol. Internet Engineering Task Force. https://tools.ietf.org/html/rfc793. Accessed: 2019-04-15.\n\nRichter, T., Clark, R., 2018. Why JPEG is not JPEG \u2014 testing a 25 years old standard. In: 2018 Picture Coding Symposium (PCS), pp. 1\u20135. doi: 10.1109/PCS.2018.8456260.\n\nRiehle, D., 2011. Controlling and steering open source projects. IEEE Comput. 44, 93\u201396. doi: 10.1109/MC.2011.206.\n\nRossi, B., Russo, B., Succi, G., 2008. Analysis about the diffusion of data standards inside European public organizations. In: 2008 3rd International Conference on Information and Communication Technologies: From Theory to Applications, pp. 1\u20136. doi: 10.1109/ICTTA.2008.4529953.\n\nShelby, Z., Hartke, K., Bormann, C., 2014. The Constrained Application Protocol (CoAP). Internet Engineering Task Force. https://www.rfc-editor.org/rfc/rfc7252. html. Accessed: 2019-10-03.\n\nTargett, E., 2019. Meet the Apache Software Foundations Top 5 code Committers. Computer Business Review. https://www.chrononline.com/feature/apache-top-5. Accessed: 2019-10-04.\n\nThe Document Foundation, 2019. LibreOffice. The Document Foundation. https://www.libreoffice.org/. Accessed: 2019-09-26.\n\nTreese, W., 1999. Putting it together: Engineering the Net: The IETF. netWorker 3, 13\u201319. doi: 10.294562.294634.\n\nveraPDF, 2019. Industry supported PDF/A validation. veraPDF Consortium. http://verapdf.org/. Accessed: 2019-06-03.\n\nW3C, 2019. The history of the web. World Wide Web Consortium. https://www.w3.org/History/the-history-of-the-web. Accessed: 2019-09-18.\n\nW3C, 2019. World wide web consortium (W3C). World Wide Web Consortium. https://www.w3.org/. Accessed: 2019-09-18.\n\nWalsham, G., 2006. Doing interpretive research. Eur. J. Inf. Syst. 15, 320\u2013330. doi: 10.1057/palgrave.esi.3000585.\n\nWaSP, 2019. History of the Web Standards Project. The Web Standards Project. https://www.webstandards.org/about/history/. Accessed: 2019-09-27.\n\nWatteyne, T., Handziski, V., Vilajosana, X., Duquennoy, S., Hahn, O., Baccelli, E., Wolisz, A., 2016. Industrial wireless IP-based cyber-physical systems. Proc. IEEE 104, 1025\u20131038. doi: 10.1109/JPROC.2015.2509186.\n\nWilson, C., McGuinness, R., Jung, J., 2017. veraPDF: Building an open source, industry supported PDF/A validator for cultural heritage institutions. Digital Lib. Perspect. 33, 156\u2013165.\n\nWilson, J., 1998. The IETF: Laying the Net\u2019s asphalt. Computer 31, 116\u2013117. doi: 10.1109/2.707624.\n\nWright, S.A., Druta, D., 2014. Open source and standards: the role of open source in the dialogue between research and standardization. In: 2014 IEEE Globecom Workshops (GC Wkshps), pp. 650\u2013655. doi: 10.1109/GLOCOMW.2014.7063506.\n\nSimon Butler received a Ph.D. from The Open University in 2016. He is a researcher in the Software Systems Research Group at the University of Sk\u00f6vde in Sweden. His research interests include software engineering, open source software, program comprehension, software development tools and practices, and software maintenance.\n\nJonas Gamalielsson received a Ph.D. from Heriot Watt University in 2009. He is a senior lecturer at the University of Sk\u00f6vde and is a member of the Software Systems Research Group. He has conducted research related to free and open source software in a number of projects, and his research is reported in publications in a variety of international journals and conferences.\n\nProfessor Bj\u00f6rn Lundell received a Ph.D. from the University of Exeter in 2001, and leads the Software Systems Research Group at the University of Sk\u00f6vde. Professor Lundell\u2019s research contributes to theory and practice in the software systems domain, in the area of open source and open standards related to the development, use, and procurement of software systems. His research addresses socio-technical challenges concerning software systems, and focuses on lock-in, interoperability, and longevity of systems. Professor Lundell is active in international and national research projects, and has contributed to guidelines and policies at national and EU levels.\n\nChristoffer Brax received the M.Sc. degree from the University of Sk\u00f6vde in 2000, and a Ph.D. from \u00d6rebro University in 2011. He is a consultant with Combitech AB working in systems engineering, requirements management, systems design and architecture, and IT security. Christoffer has 18 years experience as a systems engineer.\n\nAnders Mattsson received the M.Sc. degree from Chalmers University of Technology, Sweden, in 1989 and a Ph.D. in software engineering from the University of Limerick, Ireland in 2012. He has almost 30 years experience in software engineering and is currently R&D manager for Information Products and owner of the software development process at Husquarna AB. Anders is particularly interested in strengthening software engineering practices in organizations. Special interests include software architecture and model-driven development in the context of embedded real-time systems.\n\nTomas Gustavsson received the M.Sc. degree in Electrical and Computer Engineering from KTH Royal Institute of Technology in Stockholm in 1994. He is co-founder and current CTO of PrimeKey Solutions AB. Tomas has been researching and implementing public key infrastructure (PKI) systems for more than 24 years, and is founder and developer of the open source enterprise PKI project EBJCA, contributor to numerous open source projects, and a member of the board of Open Source Sweden. His goal is to enhance Internet and corporate security by introducing cost effective, efficient PKI.\n\nJonas Feist received the M.Sc. degree in Computer Science from the Institute of Technology at Link\u00f6ping University in 1988. He is senior executive and co-founder of RedBridge AB, a computer consultancy business in Stockholm.\n\nErik L\u00f6nroth holds an M.Sc. in Computer Science and is the Technical Responsible for the high performance computing area at Scania IT AB. He has been leading the technical development of four generations of super computing initiatives at Scania and their supporting subsystems. Erik frequently lectures on development of super computer environments for industry, open source software governance and HPC related topics.", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/004-butler.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 14, "total-input-tokens": 72508, "total-output-tokens": 24116, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 4084, 1], [4084, 11681, 2], [11681, 18869, 3], [18869, 26623, 4], [26623, 32978, 5], [32978, 40731, 6], [40731, 46356, 7], [46356, 52370, 8], [52370, 59382, 9], [59382, 65788, 10], [65788, 73290, 11], [73290, 80781, 12], [80781, 90315, 13], [90315, 99881, 14]]}}
|
|
{"id": "1389018351cd159f2acfae65d41398a5c1c86836", "text": "Core-Periphery Communication and the Success of Free/Libre Open Source Software Projects\n\nKevin Crowston\\textsuperscript{1(\u2709)} and Ivan Shamshurin\\textsuperscript{2}\n\n\\textsuperscript{1} Syracuse University School of Information Studies, 348 Hinds Hall, Syracuse, NY 13244\u20134100, USA\ncrowston@syr.edu\n\n\\textsuperscript{2} Syracuse University School of Information Studies, 337 Hinds Hall, Syracuse, NY 13244\u20134100, USA\nishamshu@syr.edu\n\nAbstract. We examine the relationship between communications by core and peripheral members and Free/Libre Open Source Software project success. The study uses data from 74 projects in the Apache Software Foundation Incubator. We conceptualize project success in terms of success building a community, as assessed by graduation from the Incubator. We compare successful and unsuccessful projects on volume of communication by core (committer) and peripheral community members and on use of inclusive pronouns as an indication of efforts to create intimacy among team members. An innovation of the paper is that use of inclusive pronouns is measured using natural language processing techniques. We find that core and peripheral members differ in their volume of contribution and in their use of inclusive pronouns, and that volume of communication is related to project success.\n\n1 Introduction\n\nCommunity-based Free/Libre Open Source Software (FLOSS) projects are developed and maintained by teams of individuals collaborating in globally-distributed environments [8]. The health of the developer community is critical for the performance of projects [7], but it is challenging to sustain a project with voluntary members over the long term [4, 11]. Social-relational issues have been seen as a key component of achieving design effectiveness [3] and enhancing online group involvement and collaboration [15]. In this paper, we explore how community interactions are related to community health and so project success.\n\nSpecifically, we examine contributions made by members in different roles. Members have different levels of participation in FLOSS development and so taken on different roles [5]. A widely accepted models of roles in community-based FLOSS teams is the core-periphery structure [1, 3, 12]. For example, Crowston and Howison [7] see community-based FLOSS teams as having an onion-like core-periphery structure, in which the core category includes core developers and the periphery includes co-developers and active users. Rullani and Haeffiger [17] described periphery as a \u201ccloud\u201d of members that orbits around the core members of open source software development teams.\nGenerally speaking, access to core roles is based on technical skills demonstrated through the development tasks that the developer performs [13]. Core developers usually contribute most of the code and oversee the design and evolution of the project, which requires a high level of technical skills [7]. Peripheral members, on the other hand, submit patches such as bug fixes (co-developers), which provides an opportunity to demonstrate skills and interest, or just provide use cases and bug reports or test new releases without contributing codes directly (active users), which requires less technical skill [7].\n\nDespite the difference in contributions, both core and peripheral members are important to the success of the project. It is evident that, by making direct contributions to the software developed, core members are vital to project development. On the other hand, even though they contribute only sporadically, peripheral members provide bug reports, suggestions and critical expertise that are fundamental for innovation [17]. In addition, the periphery is the source of new core members [10, 20], so maintaining a strong periphery is important to the long-term success of a project. Amrit and van Hillegersberg [1] examined core-periphery movement in open source projects and concluded that a steady movement toward the core is beneficial to a project, while a shift away from the core is not. But how communication among core and periphery predicts project success has yet to be investigated systematically, a gap that this paper addresses.\n\n2 Theory and Hypotheses\n\nTo develop hypotheses for our study, we discuss in turn the dependent and independent variables in our study.\n\nThe dependent variable for our study is project success. Project success for FLOSS projects can be measured in many different ways, ranging from code quality to member satisfaction to market share [6]. For the community-based FLOSS projects we examine, success in building a developer community is a critical issue, so we chose building a developer community as our measure of success.\n\nTo identify independent variables that predict success (i.e., success in building a developer community), we examine communication among community members. A starting hypothesis is that more communication is predictive of project success:\n\nH1: Successful projects will have a higher volume of communication than unsuccessful projects.\n\nMore specifically, we are interested in how members in different roles contribute to projects. As noted above, projects rely on contributions from both core and peripheral members. We can therefore extend H1 to consider roles. Specifically, we hypothesize that:\n\nH2a: Successful projects will have a higher volume of communication by core members than unsuccessful projects.\n\nH2b: Successful projects will have a higher volume of communication by peripheral members than unsuccessful projects.\nPrior research on the core-periphery structure in FLOSS development has found inequality in participation between core and peripheral members. For example, Luthiger Stoll [14] found that core members make greater time commitment than peripheral members: core participants spend an average of 12 h per week, with project leaders averaging 14 h, and bug-fixers and otherwise active users, around 5 h per week. Similarly, using social network analysis, Toral et al. [19] found that a few core members post the majority of messages and act as middlemen or brokers among other peripheral members. We therefore hypothesize that:\n\nH3: Core members will contribute more communication than will peripheral members.\n\nPrior research on the distinction between core-periphery has mostly focused on coding-related behaviour, as project roles are defined by the coding activities performed [3]. However, developers do more than just coding [3]. Both core and peripheral members need to engage in social-relational behaviour in addition to task-oriented behaviour such as coding. Consideration of these non-task activities is important because effective interpersonal communication plays a vital role in the development of online social interaction [16].\n\nScialdone et al. [18] and Wei et al. [21] analyzed group maintenance behaviours used by members to build and maintain reciprocal trust and cooperation in their everyday interaction messages, e.g., through emotional expressions and politeness strategies. In this paper, we examine one factor they identified, investigating how core and peripheral members use language to create \u201cintimacy among team members\u201d thus \u201cbuilding solidarity in teams\u201d. Specifically, Scialdone et al. [18] found that core members of two teams used more inclusive pronouns (i.e., pronouns referring to the team) than did peripheral members. They interpreted this finding as meaning that \u201cperipheral members in general do not feel as comfortable expressing a sense of belonging within their groups\u201d. We therefore hypothesize that:\n\nH4: Core members will use more inclusive pronouns in their communication than will peripheral members.\n\nScialdone et al. [18] further noted that one team they studied that had ceased production had exhibited a greater gap between core and periphery in usage of inclusive pronouns. Such a situation could indicate that the peripheral members of the group do not feel ownership of the project, with negative implications for their future as potential core members. Scialdone et al. [18] noted that such use of inclusive pronouns is \u201cconsistent with Bagozzi and Dholakia [2]\u2019s argument about the importance of we-intention in Linux user groups, i.e., when individuals think themselves as \u2018us\u2019 or \u2018we\u2019 and so attempt to act in a joint way\u201d. A similar argument can be made for the importance of core member use of inclusive pronouns. We therefore hypothesize that:\n\nH5a: Successful projects will have a higher usage of inclusive pronouns by core members than unsuccessful projects.\n\nH5b: Successful projects will have a higher usage of inclusive pronouns by peripheral members than unsuccessful projects.\n3 Methods\n\n3.1 Setting\n\nScialdone et al. [18] and Wei et al. [21] studied only a few projects and noted problem making comparison across projects that can be quite diverse. To address this concern, in this paper we studied a larger number of projects (74 in total) that all operated within a common framework at a similar stage of development. Specifically, we studied projects in the Apache Software Foundation (ASF) Incubator. The ASF is an umbrella organization including more than 60 free/libre open source software (FLOSS) development projects. The ASF\u2019s apparent success in managing FLOSS projects has made it a frequently mentioned model for these efforts, though often without a deep understanding of the factors behind that success.\n\nThe ASF Incubator\u2019s purpose is to mentor new projects to the point where they are able to successfully join the ASF. Projects are invited to join the Incubator based on an application and support from a sponsor (a member of the ASF). Accepted projects (known as Podlings) receive support from one or more mentors, who help guide the Podlings through the steps necessary to become a full-fledged ASF project.\n\nThe incubation process has several goals, including fulfillment of legal and infrastructural requirements and development of relationships with other ASF projects, but the main goal is to develop effective software development communities, which Podlings must demonstrate in order to graduate from the Incubator. The Apache Incubator specifically promotes diverse participation in development projects to improve the long-term viability of the project community and ensure requisite diversity of intellectual resources. The time projects spend in incubation varies widely, from as little as two months to nearly five years, indicating significant diversity in the efforts required for Podlings to become viable projects. The primary reason that projects are retired from the Incubator (rather than graduated) is a lack of community development that stalls progress.\n\n3.2 Data Collection and Processing\n\nIn FLOSS settings, collaborative work primarily takes place by means of asynchronous computer-mediated communication such as email lists and discussion fora [5]. ASF community norms strongly support transparency and broad participation, which is accomplished via electronic communications, such that even collocated participants are expected to document conversations in the online record, i.e., the email discussion lists. We therefore drew our data from messages on the developers\u2019 mailing list for each project.\n\nA Perl script was used to collect messages in html format from the site http://markmail.org. We discarded any messages sent after the Podling either graduated or retired from the ASF Incubator, as many of the projects apparently used the same email list even after graduation. After the dataset was collected, relevant data was extracted from the html files representing each message thread and other sources.\n3.2.1 Dependent Variable: Success\n\nThe dependent variable, project success in building a community, was determined by whether the project had graduated (success) or been retired (not success) based on the list of projects maintained by the Apache Incubator and available on the Apache website. The dataset includes email messages for 24 retired and 50 graduated Podlings. The data set also included messages for some projects still in incubation and some with unknown status; these were not used for further analysis.\n\nAs a check on this measure of successful community development, we examined the number of developers active in the community (a more successful community has more developers). We considered as active members of the projects those who sent an email to the developer mailing list during incubation.\n\n3.2.2 Core Vs. Periphery\n\nCrowston et al. [9] suggested three methods to identify core and peripheral members in FLOSS teams: relying on project-reported formal roles, analysis of distribution of contributions based on Bradford\u2019s Law of Scatter, and core-and-periphery analysis of project social network. Their analysis showed that relying on project-reported roles was the most accurate. Therefore, in this study, we identified a message sender as a core member if the sender\u2019s name was on the list of project committers on the project website. If we did not find a match, then the sender was labeled as non-committer (peripheral member). We developed a matching algorithm to take into account the variety of ways that names appear in email message.\n\n3.2.3 Inclusive Pronouns\n\nAs noted above, we examined the use of inclusive pronouns as one way that team members build a sense of belong to the group. Inclusive pronouns were defined as:\n\nreference to the team using an inclusive pronoun. If we see \u201cwe\u201d or \u201cus\u201d or \u201cour\u201d, and it refers to the group, then it is Inclusive Reference. Not if \u201cwe\u201d or \u201cus\u201d or \u201cour\u201d refer to another group that the speaker is a member of.\n\nThat is, the sentences were judged on two criteria: (1) whether there are language cues for inclusive reference (a pronoun), as specified in the definition above and (2) if these cues refer to the current group rather than another group. To judge the second criteria may require reviewing the sentence in the context of the whole conversation. This usage is only one of the many indicators studied by Scialdone et al. [18] and Wei et al. [21], but it is interesting and tractable for analysis.\n\nTo handle the large volume of messages drawn from many projects, we applied NLP techniques as suggested (but not implemented) by previous research. Specifically, we used a machine-learning (ML) approach, where an algorithm learns to classify sentences from a corpus of already coded data. Sentences were chosen as the unit of coding instead of the thematic units more typically used in human coding, because sentences can be more easily identified for machine learning. Training data was obtained from the SOCQA (Socio-computational Qualitative Analysis) project at the Syracuse University (http://socqa.org/) [22, 23]. The training data consists of 10,841\nsentences drawn from two Apache projects, SpamAssassin and Avalon. Trained annotators manually coded each sentence as to whether it included an inclusive pronoun (per the above definition) or not. The distribution of the classes in the training data is shown in Table 1 (\u201cyes\u201d means the sentence has an inclusive pronoun). Note that the sample is unbalanced.\n\n| | # | % |\n|-------|-----|-----|\n| \u201cyes\u201d | 1395| 12.9|\n| \u201cno\u201d | 9446| 87.1|\n| Total | 10841| |\n\nAs features for the ML, we used bag of words, experimenting with unigrams, bigrams and trigrams. Na\u00efve Bayes (MNB), k Nearest Neighbors (KNN) and Support Vector Machines (SVM) algorithms (Python LibSVM implementation) were trained and applied to predict the class of the sentences, i.e., whether a sentence has inclusive pronoun or not. We expected that the NLP would have no problem handling the first part of the definition, but that the second (whether the pronoun refers to the project or some other group) would pose challenges.\n\n10-fold cross-validation was used to evaluate the classifier\u2019s performance on the training data. Results are shown in Table 2. The results show that though all three approaches gave reasonable performance, SVM outperformed other methods. The Linear SVM model was therefore selected for further use. We experimented with tuning SVM parameters such as minimal term frequency, etc. but did not find settings that affected the accuracy, so we used the default settings.\n\n| | Unigram | Bigram | Trigram |\n|-------|---------|--------|---------|\n| MNB | 0.86 | 0.81 | 0.75 |\n| KNN | 0.89 | 0.89 | 0.88 |\n| SVM (LinearSVC) | 0.97 | 0.97 | 0.97 |\n\nThe random guess baseline for a binary classification task would give an accuracy of 0.5; a majority vote rule baseline (classify all examples to the majority class) provides an accuracy of 0.87. The trained SVM model significantly outperforms both. To further evaluate model performance, it was applied to new data and the results checked by a trained annotator (one of the annotators of the training data set). Specifically, we used the model to code 200 sentences (10 sentences randomly selected from 5 projects each in the \u201cgraduated\u201d, \u201cin incubator\u201d, \u201cretired\u201d and \u201cunknown\u201d classes of projects). The annotator coded the same sentences and we compared the results. The Cohen kappa (agreement corrected for chance agreement) for the human vs. machine coding was 88.6 %, which is higher than the frequently applied threshold of 80 % agreement. In other words, the ML model performed at least as well as a second human coder would be expected to do.\nExamining the results, somewhat surprisingly, we found no cases where a predicted \u201cinclusive reference\u201d refers to another group, suggesting that the ML had managed to learn the second criterion. Two sentences that the model misclassified are illustrative of limitations of the approach:\n\n*It looks like it requires work with \u201cour @patterns\u201d in lib/path.pm I looked at the path.pm for www.apache.org and it is a clue.*\n\nThe actual class is \u201cno\u201d but the classifier marked it as \u201cyes\u201d because the inclusive pronoun \u201cour\u201d was included in the sentence, though in quotes.\n\n*Could also clarify download URLs for third-party dependencies we can\u2019t ship.*\n\nThe actual class is \u201cyes\u201d but the model marked the sentence as \u201cno\u201d due to the error in spelling (no space after \u201cwe\u201d). The human annotator ignored the error, but there were not enough examples of such errors for the ML to learn to do so. Despite such limitations, the benefit of being able to handle large volumes of email more than makes up for the possible slight loss in reliability of coding, especially considering that human coders are also not perfectly reliable.\n\n4 Findings\n\nIn this section we discuss in turn the findings from our study, first validating the measure of success, then examining support for each hypothesis.\n\n4.1 Membership\n\nAs a check on our measure of success (graduation from the Incubator), we compared the number of developers in graduated and retired projects (active developers were those who had participated on the mailing list). The results are shown in Table 3. As the table shows, graduated projects had more than twice as many developers active on the mailing list as did retired projects. The differences are so large than a statistical test of significance seems superfluous (for doubters, a Kruskal-Wallis test, chosen because the data are not normally distributed, shows a statistically significant difference in the number of developers between graduated and retired projects, p = 0.001). This result provides evidence for the validity of graduation as a measure of project community health.\n\n| Project status | Core | Peripheral |\n|----------------|------------|------------|\n| Graduated | 31.6 (19.4)| 82.2 (102.4)|\n| Retired | 13.9 (9.3) | 25.4 (18.3) |\n\nN = 74. Standard deviations in parentheses.\nHypothesis 1 was that successful projects would have more communication. As shown in Table 4, this hypothesis is strongly supported, as graduated projects have many times more messages sent than retired projects during the incubation process ($p = 0.0001$).\n\n**Table 4.** Mean number of project messages by project status and developer role\n\n| | Core | Peripheral |\n|----------|------------|------------|\n| Graduated| 8265 (8878)| 7306 (8908)|\n| Retired | 1791 (1805)| 1652 (2058)|\n\n$N = 74$. Standard deviations in parentheses.\n\nHypotheses 2a and 2b were that core and peripheral members respectively would communicate more in successful projects than in unsuccessful projects. The differences in Tables 4 and 5 show that these hypotheses are supported ($p = 0.0001$ for core and $p = 0.0001$ for peripheral members for overall message count in graduated vs. retired projects, and $p = 0.0011$ and $p = 0.0399$ for messages per developer).\n\n**Table 5.** Mean number of messages sent per developer by project status and developer role\n\n| | Core | Peripheral |\n|----------|------------|------------|\n| Graduated| 239 (191) | 109 (119) |\n| Retired | 107 (200) | 47 (92) |\n\n$N = 74$. Standard deviations in parentheses.\n\nHypothesis 3 was that core members would communicate more than peripheral members. From Table 4, we can see that in fact in total core and peripheral members send about the same volume of messages in both graduated and retired projects. However, there are fewer core members, so on average, each sends many more messages on average, as shown in Table 5 ($p = 0.0001$).\n\n**Table 6.** Mean number of messages including an inclusive pronoun sent per developer by project status and developer role\n\n| | Core | Periphery |\n|----------|------------|-----------|\n| Graduated| 22 (18) | 6 (5) |\n| Retired | 12 (8) | 4 (5) |\n\n$N = 74$. Standard deviations in parentheses.\nHypothesis 4 was that core members would use more inclusive pronouns than peripheral members. Table 6 shows the number of messages sent by developers that included an inclusive pronoun. The table shows that core developers do send more messages with inclusive pronouns in both graduated and retired projects (p = 0.0001).\n\nTable 7. Mean percentage of messages that include an inclusive pronoun per developer by project status and developer role\n\n| | Core | Periphery |\n|----------|----------|-----------|\n| Graduated| 7.6 (3.4)| 5.5 (2.2) |\n| Retired | 9.3 (5. )| 5.3 (3.2) |\n\nN = 74. Standard deviations in parentheses.\n\nTo control for the fact that core developers send more messages in general, we computed the percentage of messages that include an inclusive pronoun, as shown in Table 7. From this table, we can see that the mean percentage of messages sent by core developers that include an inclusive pronoun is higher than for peripheral members (p = 0.001).\n\nHypotheses 5a and b were that there would be more use of inclusive pronouns by core and peripheral members respectively in successful projects. From Table 6, this hypothesis seems supported for core members at least, but note that successful projects have more communication overall. Examining Table 7 suggests that there is in fact slightly more proportional use of inclusive pronouns by core members in unsuccessful projects, but no difference in use by peripheral members. However, neither difference is significant using a KW test, meaning that Hypothesis 5 is not supported.\n\nFinally, to assess which of the factors we examined are most predictive of projects success, we applied a stepwise logistic regression, predicting graduation from the various measures of communication developed (e.g., total number of message by developer role, mean number, percentage of message with inclusive pronouns). Our first regression identified only one factor as predictive, the number of core members. This result can be expected, as we argued above that the number of core members can also be viewed as a measure of community health. A regression without counts of members identified the total number and the mean number of messages sent by core members as predictive, with mean having a negative coefficient. (The $R^2$ for the regression was 33 %.) This combination of factors does not provide much insight as it is essentially a proxy for developer count: greatest when there are a lot of messages but not many messages per developer, i.e., when there are more developers.\n\n5 Discussion\n\nIn general, our data suggest that successful projects (i.e., those that successfully built a community and graduated from incubation) have more members and a correspondingly large volume of communication, suggesting an active community. As expected, core\nmembers contribute more, but overall, the message volume seems almost evenly split between core and peripheral members, suggesting that both roles play an important part in projects. These results demonstrate the importance of interaction between and the shared responsibilities of core and peripheral members.\n\nAs expected, core members do display somewhat greater ownership of the project, as expressed in the use of inclusive pronouns, but counter to our expectations, the use of inclusive pronouns did not distinguish successful and unsuccessful projects. A possible explanation for this result is a limitation in our data processing: we determined developer status (core or periphery) based on committer lists from the project website collected at the time of analysis. This process does not take into account the movement of developers from periphery to core (or less frequently, from core to periphery). It could be that in successful projects, active peripheral members (i.e., those using more inclusive pronouns) are invited to join the core, thus suppressing the average for peripheral members.\n\n6 Conclusions\n\nThe work presented here can be extended in many ways in future work. First, as noted, developers may change status during the project. The results would be more accurate if they took into account the history of when developers became committers to correctly assign their status over time. Obtaining such historical data is challenging but not impossible. Second, the ML NLP might be improved with a richer feature set [24], though as noted, the performance was already as good as would be expected from an additional human coder. Third, it would be interesting to examine the first few months of a project for early signs that are predictive of its eventual outcome. Fourth, it might similarly be possible to predict which peripheral members will become core members from their individual actions. Fifth, we can consider the effects of additional group maintenance behaviours from Wei et al. [21]. The Syracuse SOCQA project has had some success applying ML NLP techniques to these codes, suggesting that this analysis is feasible. Sixth, it is necessary to consider limits to the hypothesized impacts. For example, we hypothesized that more communication reflects a more developed community, but it could be that too much communication creates information overload and so has a negative impact. Finally, in this paper we have considered only communication behaviours. A more complete model of project success would take into account measure of development activities such as code commits or project topic, data for which are available online.\n\nDespite its limitations, our research offers several advances over prior work. First, it examines a much large sample of projects. Second, it uses a more objective measure of project success, namely graduation from the ASF Incubator, as a measure of community development. Finally, it shows the viability of the application of NLP and ML techniques to processing large volumes of email messages, incorporating analysis of the content of messages, not just counts or network structure.\nAcknowledgements. We thank the SOCQA Project (Nancy McCracken PI) for access to the coded sentences for training and Feifei Zhang for checking the coding results. SOCQA was partially supported by a grant from the US National Science Foundation Socio-computational Systems (SOCS) program, award 11\u201311107.\n\nReferences\n\n1. Amrit, C., van Hillegersberg, J.: Exploring the impact of socio-technical core-periphery structures in open source software development. J. Inf. Technol. 25(2), 216\u2013229 (2010)\n2. Bagozzi, R.P., Dholakia, U.M.: Open source software user communities: a study of participation in Linux user groups. Manage. Sci. 52(7), 1099\u20131115 (2006)\n3. Barcellini, F., D\u00e9tienne, F., Burkhardt, J.-M.: A situated approach of roles and participation in open source software communities. Hum.-Comput. Interact. 29(3), 205\u2013255 (2014)\n4. Bonaccorsi, A., Rossi, C.: Why F/OSS can succeed. Res. Policy 32, 1243\u20131258 (2003)\n5. Crowston, K., Wei, K., Howison, J., Wiggins, A.: Free/Libre open source software development: what we know and what we do not know. ACM Comput. Surv. 44(2), Article 7 (2012)\n6. Crowston, K., Howison, J., Annabi, H.: Information systems success in free and open source software development: theory and measures. Softw. Process Improv. Pract. 11(2), 123\u2013148 (2006)\n7. Crowston, K., Howison, J.: Assessing the health of open source communities. IEEE Comput. 39(5), 89\u201391 (2006)\n8. Crowston, K., Li, Q., Wei, K., Eseryel, U.Y., Howison, J.: Self-organization of teams for Free/Libre open source software development. Inf. Softw. Technol. 49(6), 564\u2013575 (2007)\n9. Crowston, K., Wei, K., Li, Q., Howison, J.: Core and periphery in Free/Libre and open source software team communications. In: Proceedings of the Hawai\u2018i International Conference on System System (HICSS-39) (2006)\n10. Dahlander, L., O\u2019Mahony, S.: Progressing to the center: coordinating project work. Organ. Sci. 22(4), 961\u2013979 (2011)\n11. Fang, Y., Neufeld, D.: Understanding sustained participation in open source software projects. J. Manage. Inf. Syst. 25(4), 9\u201350 (2009)\n12. Jensen, C., Scacchi, W.: Role migration and advancement processes in OSSD projects: a comparative case study. In: Proceedings of the 29th International Conference on Software Engineering (ICSE), pp. 364\u2013374 (2007)\n13. Jergensen, C., Sarma, A., Wagstrom, P.: The onion patch: migration in open source ecosystems. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, pp. 70\u201380 (2011)\n14. Luthiger Stoll, B.: Fun and software development. In: Proceedings of the First International Conference on Open Source Systems, Genova, Italy, 11\u201315 July 2005\n15. Park, J.R.: Interpersonal and affective communication in synchronous online discourse. Libr. Q. 77(2), 133\u2013155 (2007)\n16. Park, J.-R.: Linguistic politeness and face-work in computer mediated communication, part 2: an application of the theoretical framework. J. Am. Soc. Inf. Sci. Technol. 59(14), 2199\u20132209 (2008)\n17. Rullani, F., Haefliger, S.: The periphery on stage: the intra-organizational dynamics in online communities of creation. Res. Policy 42(4), 941\u2013953 (2013)\n18. Scialdone, M.J., Heckman, R., Crowston, K.: Group maintenance behaviours of core and peripheral members of Free/Libre open source software teams. In: Proceedings of the IFIP WG 2.13 Working Conference on Open Source Systems, Sk\u00f6vde, Sweden, 3\u20136 June 2009\n\n19. Toral, S.L., Mart\u00ednez-Torres, M.R., Barrero, Federico: Analysis of virtual communities supporting OSS projects using social network analysis. Inf. Softw. Technol. 52(3), 296\u2013303 (2010)\n\n20. von Krogh, G., Spaeth, S., Lakhani, K.R.: Community, joining, and specialization in open source software innovation: a case study. Res. Policy 32(7), 1217\u20131241 (2003)\n\n21. Wei, K., Crowston, K., Li, N.L., Heckman, R.: Understanding group maintenance behaviour in Free/Libre open-source software projects: the case of fire and gaim. Inf. Manage. 51(3), 297\u2013309 (2014)\n\n22. Yan, J.L.S., McCracken, N., Crowston, K.: Design of an active learning system with human correction for content analysis. Paper Presented at the Workshop on Interactive Language Learning, Visualization, and Interfaces, 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, June 2014. http://nlp.stanford.edu/events/illvi2014/papers/mccracken-illvi2014.pdf\n\n23. Yan, J.L.S., McCracken, N., Crowston, K.: Semi-automatic content analysis of qualitative data. In: Proceedings of the iConference, Berlin, Germany, 4\u20137 Mar 2014\n\n24. Yan, J.L.S., McCracken, N., Zhou, S., Crowston, K.: Optimizing features in active machine learning for complex qualitative content analysis. Paper Presented at the Workshop on Language Technologies and Computational Social Science, 52nd Annual Meeting of the Association for Computational Linguistics Baltimore, MD, June 2014", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/005-crowston-shamshurin.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 12, "total-input-tokens": 26748, "total-output-tokens": 7919, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 2626, 1], [2626, 5540, 2], [5540, 8685, 3], [8685, 11666, 4], [11666, 14804, 5], [14804, 17428, 6], [17428, 19739, 7], [19739, 21689, 8], [21689, 24510, 9], [24510, 27661, 10], [27661, 30815, 11], [30815, 32524, 12]]}}
|
|
{"id": "2360e4058f84e5d8dee9562b4a667b6562feea3e", "text": "An Exploratory Mixed-Methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source Software\n\nLucas Franke\nlfranke@vt.edu\nVirginia Tech\nBlacksburg, Virginia, USA\n\nHuayu Liang\nhuayu98@vt.edu\nVirginia Tech\nBlacksburg, Virginia, USA\n\nSahar Farzanehpour\nsaharfarza@vt.edu\nVirginia Tech\nBlacksburg, Virginia, USA\n\nAaron Brantly\nabrantly@vt.edu\nVirginia Tech\nBlacksburg, Virginia, USA\n\nJames C. Davis\ndavisjam@purdue.edu\nPurdue University\nWest Lafayette, Indiana, USA\n\nChris Brown\ndcbrown@vt.edu\nVirginia Tech\nBlacksburg, Virginia, USA\n\nABSTRACT\n\nBackground: Governments worldwide are considering data privacy regulations. These laws, such as the European Union\u2019s General Data Protection Regulation (GDPR), require software developers to meet privacy-related requirements when interacting with users\u2019 data. Prior research describes the impact of such laws on software development, but only for commercial software. Although open-source software is commonly integrated into regulated software, and thus must be engineered or adapted for compliance, we do not know how such laws impact open-source software development.\n\nAims: Understanding how data privacy laws affect open-source software development. We focused on the European Union\u2019s GDPR, as it is the most prominent such law. We specifically investigated how GDPR compliance activities influence OSS developer activity (RQ1), how OSS developers perceive fulfilling GDPR requirements (RQ2), the most challenging GDPR requirements to implement (RQ3), and how OSS developers assess GDPR compliance (RQ4).\n\nMethod: We distributed an online survey to explore perceptions of GDPR implementations from open-source developers (N=56). To augment this analysis, we further conducted a repository mining study to analyze development metrics on pull requests (N=31,462) submitted to open-source GitHub repositories.\n\nResults: Our results suggest GDPR policies complicate open-source development processes and introduce challenges for developers, primarily regarding the management of users\u2019 data, implementation costs and time, and assessments of compliance. Moreover, we observed negative perceptions of GDPR from open-source developers and significant increases in development activity, in particular metrics related to coding and reviewing activity, on GitHub pull requests (PRs) related to GDPR compliance.\n\nConclusions: Our findings provide future research directions and implications for improving data privacy policies, motivating the need for policy-related resources and automated tools to support data privacy regulation implementation and compliance efforts in open-source software.\n\n1 INTRODUCTION\n\nSoftware products collect an increasing amount of data from users to enhance user experiences through personalized, machine learning-enabled [53] application behaviors [33] and marketing [79]. Such practices may benefit users, but also threaten their well-being. For example, in 2013, Facebook allowed the political research firm Cambridge Analytica to access data on ~87 million Facebook users [62]. Cambridge Analytica used this data to influence US elections [114, 115].\n\nTo protect their citizens, over 100 governments worldwide are developing data privacy regulations [105]. Their goal is to constrain how their citizens\u2019 personal data is collected, processed, stored, and saved. Some target specific industries, e.g., the United States\u2019s Health Insurance Portability and Accountability Act (HIPAA), which places requirements on healthcare organizations handling medical data [7]. Others cover personal data regardless of context, e.g., the European Union\u2019s General Data Protection Regulation (GDPR), which grants rights to EU citizens and affects entities that handle their data [12]. The penalties for non-compliance with data privacy laws and regulations may be severe [18, 46]. For example, under GDPR, corporations have been fined millions or billions of euros [80]. Most organizations store and manipulate this data electronically through software, and so ensuring the software is in legal compliance is an important software engineering task.\n\nData privacy regulations create challenging software requirements because they entail both technical and legal expertise. Software developers must implement required features, such as obtaining consent from users for data collection, to ensure their organizations\u2019 products are compliant. However, developers may have limited legal knowledge [81, 109] and receive minimal training [21, 55]. This can lead to coarse solutions, such as exiting the affected market [88] \u2014 hundreds of websites simply banned all European users when GDPR went into effect [97, 103]. Researchers have explored the impact of data privacy regulations on businesses [72, 73, 88], users [22, 32, 68], and observable software product properties such as website cookies [67] and database performance [92]. However, there has been limited study of how such laws affect the software development process. The few existing studies have been of commercial software development [20, 29]; we lack knowledge of the effects of GDPR on open-source software (OSS) development.\n\nThe goal of this work is to describe the impact of data privacy regulation compliance on open-source software. Our study is the first on this topic.2 We therefore adopt an exploratory methodology to provide an initial characterization and identify phenomena of\n\n2This paper is an extension of our preliminary work, presented as a poster [44].\ninterest for further study. Our study draws on two data sources collected in two phases. The first phase examined qualitative data on developers\u2019 experiences with GDPR implementations in OSS, collected via a survey (N=56). To further investigate the impact of GDPR in OSS, the second phase collected and analyzed developers\u2019 activities in open-source projects on GitHub, examining metrics and sentiments on 31,462 pull requests, divided into 15,731 GDPR and non-GDPR pull requests (PRs).\n\nOur results show GDPR compliance negatively impacts open-source development\u2014incurring complaints from developers and significantly increasing coding and reviewing activities on PRs. In addition, despite the benefits of data privacy regulations for users, we find developers have mostly negative perceptions of the GDPR, reporting challenges with implementing and verifying policy compliance. We also find that interactions with legal experts hinder development processes, yet developers rarely consult with legal teams\u2014often relying on ad hoc methods to verify GDPR compliance.\n\nIn sum, our contributions are:\n\n- We survey OSS developers to understand developers\u2019 experiences with GDPR compliance and challenges with implementing and assessing data privacy regulations.\n- We empirically analyze the impact of GDPR-related implementations on development activity metrics.\n- We use natural language processing (NLP) techniques to evaluate the perceptions of GDPR compliance through discussions on OSS repositories.\n\nSignificance: This work contributes an exploratory analysis on the impact of GDPR compliance on open-source software. It identifies interesting phenomena for further research\u2014in particular opportunities to support policy implementation and verification. We also provide recommendations for policymakers and software developers to improve data privacy regulations and their implementation.\n\n2 BACKGROUND\n\n2.1 Software Regulatory Compliance\n\n2.1.1 In General. Software requirements are divided into two categories: functional and non-functional [96]. Functional requirements pertain to input/output characteristics, i.e., the functions the software computes. Non-functional requirements cover everything else, such as resource constraints, deployment conditions, and development process. One major class of non-functional requirement is compliance with applicable standards and regulations. These requirements are typically developed and enforced on a per-industry basis in acknowledgment of that industry\u2019s risks and best practices [54].\n\nComplying with standards and regulations has been part of software engineering work for many years. Some standards apply to any manufacturing process, e.g., the ISO 9001 quality standard [11]. Others are generic to software development (e.g., ISO/IEC/IEEE 90003 [10]). Still others are contextualized to the risk profile of the usage context, e.g., ISO 26262 [13] or IEC 61508 [9] which describe standards for safety-critical systems [54]; the US HIPAA law (Health Insurance Portability and Accountability Act) which describes privacy standards for handling medical data [7]; and the US FERPA law (Family Education Rights and Privacy Act) which describes privacy standards for handling educational data [5]. Although these regulations are not new (e.g., FERPA dates to 1974, HIPAA to 1996, and IEC 61508 to 1998), software engineering teams still struggle to comply with them [34, 40, 43, 75].\n\n2.1.2 In Open-Source Software. This study focuses on GDPR compliance in open-source software. The reader may be surprised that regulatory compliance is a factor in open-source software development, as open-source software licenses such as MIT [3], Apache [8], and GNU GPL [6] disclaim legal responsibility. For example, the MIT license, the most common license on GitHub [27], states \u201cthe software is provided \u2018as is\u2019, without warranty...[authors are not] liable for any claim, damages, or other liability\u201d. However, users and developers of open-source software may desire regulatory compliance. We note three examples. (1) A majority of open-source software is developed for commercial use [47] and may require standards or regulatory compliance [108]. (2) Users with open-source software components in software supply chains [52, 83] may request compliance requirements such as web cookies. The developers may service these requests. (3) Users may extend open-source software themselves and undertake their own compliance analysis [99]. Standards such as IEC 61508\u2013Part 3 include provisions for doing so [60].\n\nOpen-source software is no longer a minor player in commercial software engineering. Multiple estimates suggest that open-source components comprise the majority of many software applications [47, 82]. In a 2023 survey of \u223c1700 codebases across 17 industries, Synopsys found open-source software in 96% of the codebases and reported an average contribution of 75% of the code in the codebase [101]. It is therefore important to understand how open-source software development considers non-functional requirements such as regulatory compliance.\n\n2.2 Privacy Regulations, Especially GDPR\n\n2.2.1 Consumer Privacy Laws. In \u00a72.1 we discussed standards and regulatory requirements that affect software products based on industry. Recently a new kind of regulation has begun to affect software: consumer privacy laws. The most prominent example of such a law is the European Union\u2019s General Data Protection Regulation (EU GDPR), enacted in 2016 and enforceable beginning in 2018. Examples in the United States include the California Consumer Privacy Act (CCPA, enacted 2018) and the Virginia Consumer Data Protection Act (CDPA, enacted 2021). Similar legislation has been considered by >100 governments [59, 105].\n\n2.2.2 The General Data Protection Regulation (GDPR). The General Data Protection Regulation (GDPR) [12] protects the personal data of European Union (EU) citizens, regardless of whether data collection and processing is based in the EU. The law has implications for entities that interact with the personal data of EU citizens, divided into data subjects, data controllers, and data processors [45]. Data subjects are individuals whose personal data is collected. Data controllers are any entities \u2014organization, company, individual, or otherwise \u2014 that own, control, or are responsible for personal data. Data processors are entities that process data for data controllers. The GDPR grants data subjects rights to their personal data, providing guidelines and requirements to data controllers and processors to understand how to properly handle this data.\n\nGDPR compliance is complex for software engineers and consequential for their organizations. Data controllers and processors\ncommonly use software, e.g., a controller\u2019s mobile app transmits data to its backend service and processors subsequently access and update the database. Software teams must determine appropriate data policies, update their systems to comply, and validate them, e.g., incorporating cookie consent notices into websites to provide users with informed consent [106]. Anticipating a lengthy compliance process, the EU enacted the GDPR in 2016 but made it enforceable in 2018, allowing two years for corporations to prepare [1]. Companies in the US and UK alone invested $9 billion in GDPR compliance [110]. As of December 2022, many use manual compliance methods or are not compliant [14]. Non-compliance is costly: thousands of distinct fines have been imposed on non-compliant data controllers and processors, exceeding \u20ac2.5 billion [15].\n\nAlthough GDPR compliance affects any software that processes the data of EU citizens, and open-source software components comprise the majority of many software applications that process such data [47, 82, 101], to the best of our knowledge there is no prior research on the impacts of GDPR compliance in open-source software.\n\n3 METHODOLOGY\n\n3.1 Data Availability and Research Questions\n\nIn \u00a72 we described a range of privacy-related standards and regulations. We noted that there has been little study of the effect of these requirements on open-source software engineering practice. To address this gap, we need data. Table 1 estimates the availability of software engineering data associated with these requirements through two common metrics: the number of posts on Stack Overflow and the number of pull requests on GitHub.\n\n| Privacy Law (Year) | Stack Overflow | GitHub-PRs |\n|-------------------|----------------|------------|\n| GDPR (2016) | 2058 | 64 K |\n| HIPAA (1996) | 725 | 5 K |\n| CCPA (2018) | 96 | 1 K |\n| FERPA (1974) | 35 | 254 |\n| CDPA (2021) | 7 | 19 |\n| PIPEDA (2000) | 5 | 31 |\n\nBased on this data, we scoped our study to the EU\u2019s GDPR; and to open-source software hosted on GitHub, currently the most popular hosting platform for OSS. We answer four research questions:\n\n**RQ1:** How does GDPR compliance influence development activity on OSS projects?\n\n**RQ2:** How do OSS developers perceive fulfilling GDPR requirements?\n\n**RQ3:** What GDPR concepts do OSS developers find most challenging to implement?\n\n**RQ4:** How do OSS developers assess GDPR compliance?\n\nWe analyzed data from quantitative and qualitative sources: surveying open-source developers and mining OSS repositories on GitHub. We present how we obtained and analyzed each data source next. We integrate this data in answering RQ1 and RQ2, and use the survey data alone to answer RQ3 and RQ4.\n\n3.2 Data Source 1: Developer Survey\n\nTo explore the impact of implementing GDPR policies on OSS development, we distributed an online survey for open-source developers. This data informed our answers to all RQs. We used a four-step approach motivated by the framework analysis methodology [90] for policy research to collect and analyze data in the second phase of our experiment. An overview of this process is presented in Table 2. Our Institutional Review Board (IRB) provided oversight.\n\n3.2.1 Step 1: Pilot Study and Data Familiarization. To formulate an initial thematic framework for our qualitative analysis, we conducted semi-structured pilot interviews with OSS developers (n = 3). As no prior work has explored the perceptions of GDPR compliance in OSS, pilot interviews gave us insight into developers\u2019 perceptions and experiences with implementing GDPR concepts in the context of open-source software development. Two subjects had contributed to PRs in our dataset, and the third was a personal contact. They had a wide range of open-source development experience, from < 1 year to > 20 years. Interviews were transcribed using Otter.ai and coded by two researchers to inform our survey.\n\nThematic analysis of our pilot interviews provided insight that informed our survey questions. The participants highlighted the challenges with implementing GDPR requirements in open-source software. One participant worked at a large corporation and outlined differences between GDPR compliance at their company and in OSS, namely with (1) approaches used to assess whether compliance is implemented correctly, and (2) access to legal teams. The other two participants discussed the impact of the GDPR, noting its privacy benefits as well as challenges OSS developers face implementing GDPR requirements and assessing compliance. These findings informed our survey.\n\n3.2.2 Step 2: Survey Design. The survey consisted of open-ended and short answer questions seeking details about GDPR implementation and experiences in the context of open-source software development. We used the pilot study interview results to identify topics to focus on in the survey. Based on the interviews, we asked about the perceived impact of the GDPR on data privacy, the most difficult concepts to implement, and how they assess GDPR compliance. The survey instrument is in the supplemental material.\n\n3.2.3 Step 3: Participant Recruitment. We distributed our survey in three rounds. In the first round, we emailed a sample of 98 developers who authored or commented on GDPR-related pull requests with a publicly available email addresses. We received 5 responses, i.e., a 5% response rate. In the second round, we made broader calls for participation on Twitter and Reddit. We received 44 responses, 2 of which indicated no experience implementing GDPR compliance. All survey respondents in these rounds were entered in a drawing for two $100 Amazon gift cards. After a few months, we undertook the third round, redistributing our survey to an additional 235 GitHub users with GDPR implementation experience (authored GDPR-related pull requests in our dataset) and offered individual compensation ($10 gift card) to encourage participation. We received 9 responses (4% response rate). In total we have data from 56 survey participants (14 from direct GitHub contacts and 42 from Twitter and Reddit).\nTable 2: Overview of sample questions from pilot interview study and survey design/analysis for framework analysis approach used for Data Source 2. The final column notes the inter-rater agreement score for these themes using the $\\kappa$ score, prior to reaching agreement.\n\n| Interview Question | Codes | Survey Question | Codes | $\\kappa$ |\n|--------------------|-------|----------------|-------|---------|\n| What meaningful impact, if any, do you believe the GDPR has had on data security and privacy? | data privacy, rights to users, data collection | What impact, if any, do you believe the GDPR and similar data privacy regulations have had on data security and privacy? | data privacy, data processing, data collection, insufficient information, data breach, fines | 0.736 |\n| What GDPR concepts do you find the most difficult or frustrating to implement? | None, data minimization, embedded content | What GDPR concepts do you find the most difficult or frustrating to implement? | privacy by design, data minimization, cost, data processing, user experience, data management, security risks, None, lawfulness and dispute resolution, time, right to erasure | 0.929 |\n| Have you had to specifically seek out legal consultation on GDPR-related issues, and if so, how did that affect your development process? | Yes/No; no effect, negative effect (time) | Have you had to specifically seek out legal consultation on GDPR-related issues, and if so, how did that affect your development process? | Yes/No; N/A, no effect, positive effect, negative effect (cost, time, data storage, data processing,...) | 0.514 |\n| During your software development projects, do you frequently consult with a legal team, and if so, how does this impact the development processes? If not, how did you assess GDPR compliance for your software projects? | Yes: legal consultation; No: privacy by design, data minimization | During your software development projects, have you consulted with a legal team? If not, how do you assess GDPR compliance for your software projects? | Yes: legal consultation; No: accountability system, online resources, self-assessment, data management, none), N/A | 0.668 |\n| \u2014 | \u2014 | Has implementing GDPR concepts for compliance impacted your development process in any way? (yes/no/maybe) Please explain: | positive impact (logging, privacy by design), negative impact (cost, data management, security,...), no impact | 0.860 |\n\nOur participants have a median of approximately 5 years of OSS development experience (avg = 5.9) and 6 years of general industry experience (avg = 7.7). Participants reported contributing to a variety of OSS projects such as Mozilla, Wordpress, Fedora, Moodle, Ansible, Flask, Django, Kubernetes, PostGreSQL, OpenCV, GitLab, and Microsoft Cognitive Toolkit.\n\n3.2.4 Step 4: Data Analysis. To analyze our survey results, we used an open coding approach. Two researchers independently performed a manual inspection of responses\u2013highlighting keywords and categorizing responses based on the pre-defined themes derived from our pilot study. If new themes arose, the coders discussed and agreed upon adding the new theme. Then, both coders came together to merge their individual results. Finally, we used Cohen\u2019s kappa ($\\kappa$) to calculate inter-rater agreement (see Table 2).\n\n3.3 Data Source 2: GDPR PRs on GitHub\n\nWe collected data concerning GDPR compliance by analyzing pull requests on GitHub repositories. Pull requests are a mechanism on GitHub that allow developers to collaborate on open-source repositories, involving code contributions from developers to be reviewed and merged into the source code [48].\n\n3.3.1 GDPR and non-GDPR PRs. We used the GitHub REST API to search for GDPR-related pull requests\u2014pull requests returned by the GitHub API\u2019s default search with the query string \u201cGDPR\u201d. Manual inspection suggested the results are typically English-language PRs related to (GDPR) data privacy regulatory compliance.\n\nUsing this method, we collected GDPR-related PRs created from April 2016 (when the GDPR was adopted by the European Parliament) to January 2024. We removed content submitted by users with \u201cbot\u201d in their username [16] and designated as a bot type according to the GitHub API to avoid PRs generated by automated systems. This resulted in 15,731 GDPR-related pull requests across 6,513 unique GitHub repositories. For comparison, we also collected a random sample of 15,731 pull requests created in these same repositories after April 2016 that did not mention \u201cGDPR\u201d, which we call non-GDPR-related pull requests. The studied repositories had a median of 14 stars (avg = 1,635), 11 forks (avg = 416), 727 commits (avg = 8,997), 172 PRs (avg = 1,425), and 15 contributors (avg = 59), suggesting popular, active repositories. The distribution of PRs across all repositories in our GDPR-related and non-GDPR-related datasets is summarized in Table 3.\n\n3.3.2 Measuring Development Activity. To analyze GDPR\u2019s impacts, we collected development activity metrics [49] per pull request:\n\n- **Comments:** the total number of comments\n- **Active time:** the amount of time the PR remained active (until merged or closed)\n\nTable 3: Distribution of PRs in Datasets.\n\n| Dataset | min | 50%ile | 75%ile | 90%ile | max |\n|---------|-----|--------|--------|--------|-----|\n| GDPR | 1 | 1 | 2 | 3 | 956 |\n| non-GDPR| 1 | 2 | 10 | 34 | 203 |\n\n3https://docs.github.com/en/graphql/reference/objects#bot\n\u2022 **Commits**: the total number of commits\n\u2022 **Additions**: the number of lines of code added\n\u2022 **Deletions**: the number of lines of code removed\n\u2022 **Changed files**: the total number of modified files\n\u2022 **Status**: outcome of PR (merged, closed, or open)\n\nWe selected these metrics to analyze development activity, specifically to derive coding and code review tasks from pull requests. We compared the distributions of these metrics between GDPR-related and non-GDPR-related PRs using a Mann-Whitney U test, to compare nonparametric ordinal data between the datasets [76]. To control for multiple comparisons on the same dataset, we calculate adjusted p-values using Benjamini-Hochberg correction [30]. We measure effect size ($r$) for significant results using Cohen\u2019s $d$ [39].\n\n### 3.3.3 Measuring Developer Perception\n\nTo augment our survey results, we applied sentiment analysis\u2014a technique to automatically infer sentiment from natural language\u2014on the title, body, commit messages, review comments, and discussion comments from pull requests in our datasets to examine developer perceptions of GDPR compliance. Prior studies have similarly inferred developer sentiment and emotion from GitHub activity, including PR discussion comments [87], review comments [57], commit messages [50], and bodies [84]. While this technique sometimes has negative results in software engineering contexts [64], we use it in our exploratory work as a proxy to obtain preliminary insights into developers\u2019 sentiments regarding GDPR compliance in OSS.\n\nWe followed standard NLP preprocessing steps [69]: (1) We removed bot-generated content using the process described in Section 3.3.1. (2) We removed non-sentiment material: hyperlinks and mentions (\u201c@username\u201d). (3) We tokenized text using the Natural Language Toolkit (NLTK) tokenize library. (4) We converted tokens to lowercase and removed punctuation. (5) We removed stopwords such as \u201cbut\u201d and \u201cor\u201d (nltk.corpus library). (6) We lemmatized the text, i.e., reducing words to their base form (e.g., \u201cmice\u201d becomes \u201cmouse\u201d [23]) using WordNetLemmatizer from the nltk.stem library. (7) We normalize the data by removing meaningless tokens, such as SHA or hash values for commits, and non-standard English words, such as words that contain numerical values (i.e., \u201c3d\u201d) [98].\n\nAfter preprocessing the data, we were left with 15,731 titles, 14,515 bodies, 15,217 commit messages, 4,922 review comments, and 4,862 discussion comments across the GDPR-related pull requests. We compared these against non-GDPR-related PRs, for which we had 15,731 titles, 13,718 bodies, 15,652 commit messages, 3,427 review comments, and 3,165 discussion comments.\n\nTo perform sentiment analysis, we use three state-of-the-art models: Liu-Hu [56], VADER [58], and SentiArt [63]. We fed the preprocessed textual data to each model, which provided compound sentiment scores. We use a t-test ($t$) to statistically analyze sentiment across our datasets. Moreover, we aim to assess the impact of the GDPR on developer sentiment over time. To accomplish this, we divided the GDPR and non-GDPR PRs into 3-month segments based on the creation date of the PR. Then, we performed sentiment analysis on the binned data to observe whether and how developer sentiments manifest in OSS interactions over the lifecycle of the GDPR regulation \u2014 from its initial adoption in 2016, enforcement in 2018, and to the present. We combined all preprocessed textual elements (title, body, commit messages, review comments, and discussion comments) to observe the overall trends in PR communications and compare with non-GDPR data as a baseline sentiment in developer communications for the projects studied.\n\n### 4 RESULTS\n\nWe are interested in understanding the impact of GDPR implementations on open-source software by analyzing development activity and developer perceptions, including challenges with implementation and assessment of compliance. In this work, we answer our research questions using multiple sources\u2014analyzing GitHub repositories and surveying open-source developers. For RQ1 and RQ2, we report views from the survey and the GitHub measurements. For RQ3 and RQ4, we use data only from the survey.\n\n#### 4.1 RQ1: Development Activity\n\nThis question was: **RQ1: How does GDPR compliance influence development activity on OSS projects?**\n\n##### 4.1.1 Survey\n\nWe surveyed 56 OSS developers to understand the impact of GDPR implementations on development activity. Most participants ($n = 41, 73\\%$) responded \u201cYes\u201d to a question regarding the impact of implementing GDPR concepts on development processes, indicating data privacy compliance effects open-source development. When asked to elaborate, 23 developers provided examples of development impacts related to the GDPR.\n\n**Data Management:** 11 participants mentioned GDPR requirements related to data management impact development activity, notably increasing development efforts. For instance, responses indicated handling personal data (P17) and anonymization (P19), managing data controllers (P21) and data recipients (P23), implementing functionality to limit the collection of personal data (P26), and the monitoring of data subjects from the EU (P28) all impacted development processes. P53 also added \u201cwe had to separate in a clear way sensitive data from the other data\u201d, exemplifying the effort needed to implement compliant data processing in OSS.\n\n**Time and Costs:** Five participants mentioned GDPR compliance increases development time and costs in OSS. For example, regarding time, respondents said \u201cit does slow down our development cycle\u201d (P54) and \u201cwe lost a complete year to be ready\u201d (P56). For costs, participants said \u201cbudgets have soared\u201d (P5) and \u201ccosts of production should not go over the cost of consequence of data breach\u201d (P46).\n\n**Design:** Three participants also noted the effects of GDPR compliance on the design and structure of software products. For example, P54 responded \u201cwe have to check whether we comply with GDPR every time we draft a new design\u201d and P55 added \u201cthe design of systems now incorporates the concept of needing to remove PII after the fact\u201d. P21 explained how GDPR compliance reduced the quality of their application\u2019s design\u2014replying \u201cthe principle of minimum scope was not observed\u201d\u2014indicating potential unnecessarily extended scopes of variables in the code [36].\n\n**Organization:** Three participant responses embodied the negative effects of data privacy regulations on their organization, stating the GDPR has a \u201cmajor impact\u201d requiring \u201can overhaul of project management and program priorities\u201d (P1). P45 highlighted that \u201cmaking sure to follow privacy by design\u201d is challenging for GDPR compliance in OSS development. One participant also mentioned\nadditional steps to verify implementations affected their development, stating \"we need to make an additional review with the GDPR consultants that functionality that is related to the users\u2019 data\" (P53).\n\n**Benefits:** One participant mentioned benefits to their development team and processes regarding the implementation of GDPR concepts, stating it helped highlight \"things we had not considered before\", such as ensuring that \"logging functionality\" and \"access restrictions\" were in place (P1). However, the majority of responses indicate that GDPR compliance often increases development efforts and incurs negative impacts for open-source developers.\n\n### 4.1.2 Pull Request Metrics\n\nTo further observe the impact of GDPR compliance on OSS, we compared metrics for GDPR and non-GDPR related PRs. Table 4 presents these results. Using a Mann-Whitney U test, we found statistically significant differences between GDPR and non-GDPR PRs in the number of comments, active time, number of commits, lines of code added, lines of code deleted, and number of modified files. We also calculate the effect size for these results.\n\nThis indicates that incorporating changes related to the GDPR has a major impact on development work, leading to increased discussions between developers, longer review times, more code commits, and higher code churn. While we observed significant differences exist in pull request metrics between GDPR and non-GDPR PRs, the calculated effect sizes are \"small\" [71], indicating low practical differences between the groups. Yet, these findings support our survey results from open-source developers purporting that GDPR compliance efforts affect OSS development.\n\n**Finding 1:** Developers report implementing GDPR compliance negatively affects development processes\u2014citing cost, time, and data management as concerns.\n\n**Finding 2:** PRs related to GDPR compliance have significantly more development activity for coding (comments, additions, deletions, files changed) and review (comments, active time) tasks.\n\n### Table 4: GDPR (G) vs. Non-GDPR (non-G) GitHub Activity Metrics.\n\n| Characteristic | Type | Median | p-value |\n|----------------------|------|--------|---------|\n| Comments* | G | 1 | < 0.0001 |\n| | non-G| 1 | (U = 1.4E8, r = 0.09) |\n| Active time (days)* | G | 418.05 | < 0.0001 |\n| | non-G| 1.78 | (U = 1.4E8, r = 0.14) |\n| Commits* | G | 2 | < 0.0001 |\n| | non-G| 1 | (U = 1.4E8, r = 0.04) |\n| Additions* | G | 57 | < 0.0001 |\n| | non-G| 19 | (U = 1.5E8, r = 0.05) |\n| Deletions* | G | 7 | < 0.0001 |\n| | non-G| 4 | (U = 1.3E8, r = 0.05) |\n| Changed files* | G | 4 | < 0.0001 |\n| | non-G| 2 | (U = 1.4E8, r = 0.03) |\n\n* denotes statistically significant results (p-value < 0.05)\n\n### 4.2 RQ2: GDPR Perceptions\n\nThis question was: **RQ2: How do OSS developers perceive fulfilling GDPR requirements?**\n\n#### 4.2.1 Survey\n\nWe asked participants their perceptions on the impact of GDPR regulations on privacy. Of participants who responded to this question (n = 25), most had negative opinions of the GDPR. Three participants were neutral (e.g., \"N/A\" (P4)). We summarize positive and negative perceptions next.\n\n**Negative Perceptions:** Despite the utility of data privacy regulations, 22 participants reported negative perceptions of the GDPR. These responses primarily focused on three issues: cost, organizations, and enforcement. For costs, respondents noted that implementing GDPR requirements is expensive and burdensome. Participants said that compliance is \"costly for many companies\" (P16) is \"too expensive\" (P24), and \"the cost of protection should not go over the cost of consequence of data breach...GDPR [isn\u2019t] worth the time\" (P46). P55 also highlights that \"in general there have been major costs to companies of all sizes\" regarding GDPR implementations. For organizations, participants reported a negative impact of the GDPR on companies and organizations. They mentioned that GDPR compliance \"weakens small and medium-sized enterprises\" (P15), \"threatens innovation\" (P18), \"fails to meaningfully integrate the role of privacy-enhancing innovation and consumer education in data protection\" (P23), and that \"in order to be safer than risky useful functionality is removed\" (P52). P46 added that the GDPR is \"a lot of headache...jobs for lawyers at the expense of people who are trying to solve real problems\". For enforcement, one subject said \"there is a large gap in GDPR enforcement among member states (P17) and another observed \"the trend...is an increase in the number of times and the amount of fines\" (P18). Similarly, P49 described GDPR as \"a big hammer\", but was unsure \"if it has necessarily increased security and privacy at this point\".\n\n**Positive Perceptions:** Eight participants had positive perceptions of the GDPR, generally stating that GDPR enhances data privacy for users. For example, participants said that \"the risk of incurring and paying out hefty fines has made companies take privacy and security more proactively\" (P30), that GDPR brings \"awareness to the importance about privacy\" (P45), that \"data integrity is ensured\" (P47), and \"customers can now delete their data quite easily\" (P54). Participants also appreciated the increased accountability for corporations in safeguarding users\u2019 data\u2014for example one participant stated \"Before GDPR data protection was usually considered only as an afterthought if not an outright joke. Nowadays companies will at least consider what they are doing wrong before violating data protection laws, rather than doing it by accident because no-one even thought about it\" (P50). These responses reflect the intentions of the GDPR \u2014 to safeguard the rights of users and their data online.\n\n#### 4.2.2 Sentiment Analysis\n\nWe investigated the sentiment of developers implementing GDPR concepts by analyzing PR titles, commit messages, review comments, discussion comments, and bodies. Our overall results are in Table 5. We anticipated a higher percentage of negative comments for GDPR-related pull requests. However, we did not find evidence that GDPR-related PRs have less favorable sentiments from developers. In fact, we found they often had more positive sentiments than non-GDPR-related PRs\u2014with two of the three models (Liu-Hu and VADER) indicating a statistically significant difference between the GDPR and non-GDPR sentiment. We speculate two explanations. First, non-GDPR-related PRs represent a broad range of code contributions, which could address a number\nTable 5: GDPR (G) vs Non-GDPR (non-G) Sentiment Analysis\n\n| Test | Type | Mean | Variance | p-value |\n|----------|--------|-------|----------|---------|\n| Liu-Hu* | G | 0.43 | 0.27 | \\( p < 0.0001 \\) \\( (t = -4.05, r = 0.22) \\) |\n| | non-G | -0.04 | 0.28 | |\n| VADER* | G | 0.44 | 0.04 | \\( p < 0.0001 \\) \\( (t = -6.47, r = 0.02) \\) |\n| | non-G | 0.21 | 0.01 | |\n| SentiArt | G | 0.39 | 0.01 | \\( p = 0.1399 \\) \\( (t = -1.10, r = 0.01) \\) |\n| | non-G | 0.36 | 0.002 | |\n\n* denotes statistically significant results (p-value < 0.05)\n\nFigure 1: Longitudinal GDPR (G) and Non-GDPR (non-G) Sentiment Analysis Data. We grouped GDPR and non-GDPR data into 3-month segments and used 3 sentiment models. For each model, GDPR data is plotted in a color with a filled marker, and non-GDPR data in the same color but with a hollow marker. The general trend is that sentiment for GDPR data is moderately positive, and more positive than for non-GDPR data.\n\nof issues. Second, we are limited by the capabilities of the sentiment analyzer. For example, the two most negative commit messages for non-GDPR pull requests said \u201cobsolete\u201d and \u201cfatal\u201d, which are common terms of art in software maintenance tasks [89, 113] (e.g., \u201cfix fatal error\u201d). We also observed some variation at the beginning and end of our dataset collection period, but no significant variation in sentiment over time (see Figure 1).\n\nNonetheless, manual inspection of negatively scored content showed OSS developers expressing frustration with GDPR compliance. For instance, one title and commit message described GDPR-related changes to \u201cavoid lawsuits by mentioning cookies thing\u201d [91]. Another title states adding \u201cjust enough EULA [end user license agreement] not to get banned\u201d [31]. Similar frustrations were shared in a PR body for \u201cGDPR stuff\u201d adding changes to \u201cdisplay the annoying cookies banner\u201d [104]. Discussion comments, such as \u201cwill this conflict with GDPR?\u201d [24], also highlight OSS developers\u2019 confusion with GDPR requirements.\n\nFinding 3: Despite its nominal advantages, most developers had negative perceptions of the GDPR and its implementation.\nFinding 4: We found developers did not express more negative sentiments about GDPR compliance in PR discussions.\nFinding 5: Sentiment related to GDPR compliance appears to be stable over time.\n\n4.3 RQ3: Implementation Challenges\n\nThis question was: RQ3: What GDPR concepts do OSS developers find most challenging to implement? In the survey data, we observed three common challenges: data management, data protection, and vague requirements.\n\nData Management: 11 developers responded that processing and storing users\u2019 data according to GDPR requirements is the most challenging concept to implement. For example, participants mentioned challenges implementing \u201cdata protection\u201d (P24), handling \u201cpersonal data\u201d (P34), the \u201cexchange of documents containing personal data\u201d (P32), the \u201cimproper storage\u201d (30) of user data, and \u201cknowing what info can or cannot be accessed or saved\u201d (P49). In particular, four participants mentioned users\u2019 right to erasure\u2014or the obligation for data controllers to delete users\u2019 data upon request \u201cwithout undue delay\u201d [4]\u2014as the most complicated requirement to implement. For example, P53 responded, \u201cit\u2019s not always easy enough to implement data processing in a way, that it\u2019s anonymized, and if the user would like their data to be erased, be able to continue processing of the results based on user data in an anonymous way\u201d\u2014describing the complexity of this requirement for their project.\n\nData Protection: Five participants mentioned security factors as a challenge for GDPR compliance. For instance, participants were concerned with \u201cdata protection\u201d and \u201cother security concerns\u201d (P24), \u201cleaks\u201d (P27), and the fact that other entities have \u201cthe ability to steal data\u201d (P28). P55 noted challenges with handling and securing data in \u201ccentral databases, where that data may be relied on by many loosely connected applications and systems\u201d. These responses highlight the difficulties of implementing mechanisms to safeguard users\u2019 data.\n\nVague Requirements: 10 survey respondents highlighted a lack of clear requirements as the biggest challenge with GDPR compliance in OSS. For example, one participant mentioned that GDPR \u201cis pretty vague\u201d with a lack of \u201cstandard format\u201d (P54). Another described confusion in knowing \u201chow long can data be retained\u201d and \u201cwhat is Personal[sic] Identifiable Information\u201d\u2014adding, the \u201clack of clarity in the regulations[sic] leads to confusion\u201d (P52). Moreover, P48 highlighted the lack of company understanding of GDPR requirements makes compliance difficult.\n\nBeyond these clear categories, we also received a wide range of other responses, including \u201clawfulness and dispute resolution\u201d (P47), the conflict between \u201cindividual privacy and the public\u2019s right to know\u201d (P21), and being in a \u201crush to regulate\u201d (P28). P27 mentioned challenges with user experiences, stating \u201cusers endure invasive pop-ups\u201d. Further, P1 noted the challenges evolve during the lifetime of a project, stating \u201cAt the beginning of a project, privacy by design and default. In the middle or the end, data minimization and transparency\u201d are the main challenges. Based on the challenges of implementation, participants described difficulties limiting functionality\u2014e.g., \u201cknowing when interacting with EU citizens\u201d (P49) and \u201cmore than 1,000\nnews websites in the European Union have gone dark\u201d (P15). Meanwhile, P17 mentioned difficulties implementing GDPR requirements for data-intensive programming domains: \u201cmany of the GDPR\u2019s requirements are essentially incompatible with big data, artificial intelligence, blockchain, and machine learning\u201d. These challenges motivate new resources to help developers overcome problems related to GDPR implementation and compliance.\n\n**Finding 6:** The management and protection of user data and vague requirements are key challenges open-source developers face when implementing GDPR requirements.\n\n### 4.4 RQ4: Compliance Assessment\n\nThis question was: RQ4: How do OSS developers assess GDPR compliance? We found three kinds of responses related to compliance assessment: consulting with legal counsel, referencing other compliance resources, and self-assessment.\n\n**Compliance Through Legal Counsel:** In our survey results, 15 OSS developers reported consulting with legal teams for GDPR compliance. We were also interested in exploring the impact of seeking legal counsel for GDPR compliance on OSS development processes. Seven participants with experience seeking legal consultations noted that it did have a positive impact on development activity (P6, P13, P14, P45, P53, P55, P56). Participants noted the benefits of seeking legal experts, stating the importance of \u201cconsulting with lawyers on the team who have a seat at the table\u201d (P45), it \u201cclarifies requirements and prevents misinterpretations\u201d (P55), and allowed GDPR compliance to be \u201cimplemented rather easily\u201d (P56).\n\nHowever, most participants (n = 9) with experience seeking legal counsel lamented the impact, stating it decreased development productivity: \u201cit slows things down as code has to be reviewed and objectives revised\u201d and \u201cit impacted our approach to the SDLC\u201d (P1), \u201cit\u2019s a bit of a headache\u201d (P24), \u201cit slowed us down...was mostly a box ticking exercise\u201d (P51), and \u201cit interrupted the development but it is required\u201d (P49). Respondents also bemoaned the costs of working with legal teams, stating \u201cfor a global project open source project any legal advice would be extremely expensive\u201d (P52) and \u201copen-source projects can\u2019t afford even to sustain maintainers, not even speaking about legal team...Legal teams are consulted with some corps want to kill the project\u201d (P47). P54 also noted legal experts found difficulties with the vagueness of GDPR compliance, replying the \u201clegal team struggles to interpret how to comply with GDPR, there are a lot of back-and-forth. We have to change our design many times\u201d.\n\nIn sum, legal experts can provide valuable insight into data privacy regulations and compliance, but developers often find these interactions negatively impact development processes.\n\n**Compliance Resources:** To assess GDPR compliance, three participants mentioned a variety of other resources. One participant described formal training on regulatory compliance, with a \u201cspecial training on GDPR within the company\u201d (P16). Another participant responded that their team uses an \u201caccountability system\u201d (P24) to assess compliance. Finally, P15 noted using online resources to help, but highlighted their ineffectiveness, stating, \u201cmany of the articles on the Internet about GDPR are incomplete or even wrong\u201d.\n\n**Self-assessment:** Other developers mentioned they were largely responsible for evaluating the \u201clegality\u201d (P18) and \u201cintegrity and confidentiality\u201d (P23) of the processing and storage of user data in their system on their own. P24 responded developers have to \u201cconsider whether you really need all the data you collect\u201d while P38 advised to \u201cget your consent in order\u201d. P53 noted the impact on development teams, stating GDPR implementations \u201ctook us significant amount of time due to several rounds of architecture review\u201d. P18 added there is \u201creally no good way\u201d to evaluate compliance.\n\n**Finding 7:** Developers often do not consult legal experts to validate GDPR compliance, relying on other resources such as compliance training, accountability systems, online resources, and self-assessed data management.\n\n**Finding 8:** Participants with experience interacting with legal teams provided mixed perceptions, feeling they provided valuable insight but hindered development processes.\n\n### 5 DISCUSSION AND FUTURE WORK\n\nOur results demonstrate that GDPR-related code changes have a major impact in OSS development, significantly increasing development activity with regards to number of lines of code added and the number of commits included in PRs\u2013indicating increased effort in code contributions and code review activities for developers (\u00a74.1.2). Further, we found that GDPR compliance provides a wide range of challenges for OSS development (\u00a74.3) and that developers often assess compliance without the help of legal and policy experts (\u00a74.4). These findings posit that implementing GDPR compliance is a challenging activity for OSS developers.\n\nWe recognize many stakeholders are involved in adhering to data privacy legislation. For instance, policymakers also play a role in data privacy compliance [112]. Data privacy regulations, such as the GDPR, are beneficial for protecting the rights and data of users online. However, we noticed developers complaining about providing privacy to people\u2013holding negative perceptions of the GDPR policy in general and its implementation. To that end, we provide guidelines to enhance data privacy regulations and software development processes to reduce the negative effects of policy compliance in OSS software.\n\n#### 5.1 Improving Data Privacy Regulations\n\n**Provide Clear Requirements.** We found developers struggled to implement GDPR concepts (\u00a74.3). Moreover, few respondents reported consulting with legal experts to provide insight of policies and assess the compliance of projects (\u00a74.4). Thus, most development teams are forced to evaluate the system themselves. Yet, participants complained that understanding compliance is difficult due to the ambiguity of GDPR concepts: for instance, \u201cthe procedure for obtaining user consent and the information provided are unclear\u201d (P25). Prior work suggests ambiguity is a main challenge in requirements engineering [28]. Further, incomplete requirements can increase development costs and the probability of project failure [38].\n\nTo improve program specifications, researchers have explored a variety of techniques. For instance, Wang et al. explored using natural language processing to automatically detect ambiguous terminology in software requirements [111]. Similar techniques could be applied to regulations such as the GDPR to notify policymakers of unclear language and clarify requirements for software\nengineers. Another way to improve the clarity of requirements is to involve software developers in the policy-making process. Verdon argues a good policy must be \u201cunderstandable to [its] audience\u201d [109, p. 48], yet our results show developers are confused by GDPR requirements. Prior work shows collaboration between policy makers and practitioners improves policies in domains such as public health [37] and education [61]. Thus, developers should be incorporated into the policy-making process to provide input on the impact of implementing and complying with policies concerning software development, such as data privacy regulations.\n\n5.1.2 Policy Resources. Our survey results show OSS developers face challenges implementing GDPR-related changes (\u00a74.3). Participants also found legal consultations negatively affect development processes (\u00a74.4), and report existing resources are largely ineffective, primarily relying on self-assessment within the development team. Only one participant mentioned receiving formal training on GDPR compliance (P16). To that end, OSS developers largely resort to implementing and evaluating compliance on their own efforts with \u201cinsufficient information\u201d (P26). Prior work also outlines issues with software developers and security policies, noting a lack of understanding from programmers [109].\n\nBased on our findings, we posit OSS development can benefit from novel resources to educate developers on policies and their implementation. To further support compliance, policymakers can provide resources, such as guides or online forums, to provide information on data privacy-related concepts in an accessible manner. These guidelines can also reduce the effects of GDPR compliance on code review tasks by providing specialized expertise and correct understanding for reviewers [85]. Yet, there are limited online developer communities focused on seeking help in data privacy policy implementation. Popular programming-related Q&A websites, e.g., Stack Overflow, are frequently used by developers to ask questions and seek information online [86]\u2014and are used for discussions on data privacy policy implementation (see Table 1). However developers have no way to verify the correctness of responses, which can also become obsolete over time. Zhang et al. recommend automated tools to identify outdated information in responses for development concepts, such as API libraries and programming languages [116]. A similar approach can be used to keep responses regarding GDPR compliance up-to-date and accurate.\n\n5.2 Improving Development Processes\n\n5.2.1 Privacy by Design. Participants reported challenges implementing GDPR compliance (\u00a74.3) and negative effects on development practices (\u00a74.1.1). Moreover, our GitHub analysis found GDPR-related changes necessitated significantly more time and effort (i.e., comments, commits, etc.) for developers to implement and review in PRs (see Table 4). However, compliance is required for organizations to avoid \u201cpaying out hefty fines\u201d (P30). Researchers have investigated techniques to streamline the incorporation of privacy in development processes. For instance, Privacy By Design (PBD) is a software development approach to make privacy the \u201cdefault mode of operation\u201d [35]. P50 mentioned cultivating \u201ca privacy-respecting mindset long before GDPR came about\u201d avoided negative impacts on development processes and made the effort required \u201cquite minimal\u201d. However, numerous participants noted the burden of implementing GDPR requirements, with one survey participant in particular (P1) highlighting that prioritizing privacy in software development processes \u201crequires an overhaul\u201d. Additionally, while PBD can benefit GDPR compliance efforts, Kurtz et al. note a scarcity of research in this area and note particular challenges with PBD for GDPR implementations, such as ensuring third party libraries also adhere to privacy principles [70].\n\nPBD can be effective for new projects starting from scratch [102], yet may be ill-equipped for existing projects complying with new and changing data privacy regulations. Anthonysamy et al. outline limitations with current privacy requirements that solve present issues, which may differ from regulations and policies in the future [25]. More work is needed to explore tools and processes to support data privacy in mature software projects. One solution could be a partial or gradual approach to compliance. For instance, some programming languages (e.g., Typescript) support gradual typing to selectively check for type errors in code [93]. Similarly, research in formal methods has explored supporting gradual verification of programs [26]. Thus, gradually introducing privacy into OSS can help reduce efforts related to GDPR compliance as opposed to overhauling development processes to prioritize privacy.\n\n5.2.2 Automated Tools. We found GDPR compliance has a major impact on OSS development, significantly increasing coding and reviewing tasks for PRs in GitHub repositories (see Table 4). Developers who responded to our survey also indicated the impact of GDPR compliance on their project source code, noting data privacy regulations always need more software (P4) and violate the principle of minimum scope (P21). This indicates further difficulty for developers to validate their projects for the GDPR, with one participant responding there is \u201cno good way\u201d to assess compliance (P18). These findings point to an increased burden and effort on OSS developers to implement and review GDPR requirements to comply with data privacy regulations and avoid penalties for non-compliance (e.g., losing market share).\n\nTo that end, we posit automated tools can reduce the burden of GDPR implementation efforts. One participant mentioned using a tool, an \u201caccountability system\u201d (P24), to help assess compliance\u2014however did not provide any details about this system. Our findings for RQ1 (\u00a74.1) show GDPR-related pull requests have significantly more coding involved, consisting of more commits and lines of code added in code contributions, as well as requiring significantly more comments and time in reviewing processes. Thus, systems to support data privacy implementation and tools to review policy-relevant code are needed to streamline compliance. Ferrara and colleagues present static analysis techniques to support GDPR compliance [42]. Further tools can support review processes for assessing implementation changes. Prior work suggests static analysis tools can reduce time and effort in code reviews [94]. Future systems could also provide automated feedback to developers and reviewers on data privacy regulation compliance. For instance, using NLP techniques [17] or rule-based machine learning approaches [51] to automatically summarize requirements and verify compliance.\n\n5.3 Other Directions\n\nBased on our results, we observe several other avenues of future work. First, we plan to investigate other data sources to further\nexplore GDPR compliance in open-source projects. For example, we plan to mine relevant queries from Stack Overflow to gain insight into challenges and information needs developers have for implementing GDPR policies. We will also examine answers to observe how developers respond. For instance, online discussions between developers regarding policies often use disclaimers, such as the acronyms \u201cIANAL\u201d or \u201cNAL\u201d to indicate \u201cI am not a lawyer\u201d, before offering advice or answering questions related to legal frameworks. Without legal expertise, we anticipate it is difficult for OSS developers to offer guidance and seek help complying with data privacy regulations\u2014motivating the need for novel approaches to support regulation adherence and compliance assessment.\n\nMoreover, we aim to engage with policymakers to understand their perspectives on data privacy policies and the challenges developers face implementing them. We will collect qualitative insights from politicians and individuals with authority to develop policies to further explore methods to support the implementation of privacy laws. Finally, we aim to extend this work to investigate the impact of broader technology-related policies on open-source software development practices\u2014for instance, investigating the impact of alternative data privacy regulations (i.e., the CCPA or CDPA) as well as other legal frameworks that will impact software development and maintenance, such as current and imminent legislation regarding artificial intelligence governance.\n\n6 RELATED WORK\n\nWe note two lines of related work: characterizations of stakeholder perspectives on data privacy regulations, and technical and methodological approaches for regulatory compliance.\n\nStakeholder perspectives: Research has investigated perspectives on the GDPR for stakeholders in data privacy regulation compliance. Sirur and colleagues examined organizational perceptions on the feasibility of implementing GDPR concepts, finding that larger organizations were confident in their ability to comply while smaller companies struggled with the breadth and ambiguity in GDPR requirements [95]. Earp et al. surveyed software users to show the Internet privacy protection goals and policies for online websites do not meet users\u2019 expectations for privacy [41]. Similarly, Strycharz et al. surveyed consumers to uncover frustrations and negative attitudes related to the GDPR [100]. Our work focuses on the perceptions of developers, who are responsible for implementing code changes to comply with data privacy regulations.\n\nOn the perspective of software engineers as regulatory stakeholders, van Dijk and colleagues provide an overview of the transition of privacy policies from self-imposed guidelines from developers to legal frameworks and legislation [107]. Alhazmi interviewed software developers to uncover barriers for adopting GDPR principles\u2014finding the lack of familiarity, precedented techniques, useful help resources, and prioritization from employers. The paper also found that developers generally do not prioritize privacy features in their projects, focusing instead on functional requirements prevent compliance [20]. Similarly, researchers interviewed senior engineers to understand the challenges implementing general privacy guidelines, indicating a frustration with legal interactions and the non-technical aspects of requirements [29]. Finally, Klymenko et al. interviewed technical and legal professionals to investigate measures for data privacy compliance in GDPR implementation\u2014noting a lack of understanding and need for interdisciplinary solutions [66]. While these papers take similar approaches to our research, ultimately our goals and questions are distinct, since we are specifically interested in the perspective of open-source developers.\n\nImplementing and verifying GDPR compliance: Prior work has explored approaches to implement and verify GDPR compliance. For instance, Mart\u00edn et al. recommend Privacy by Design methods and tools for GDPR compliance [78]. Shastri and colleagues introduce GDPRBench, a tool to assess the GDPR compliance of databases [92]. Li et al. investigated automated GDPR compliance as part of continuous integration workflows [74]. Al-Slais conducted a literature review to develop a taxonomy privacy implementation approaches to guide GDPR compliance [19]. Finally, Mahindrakar et al. proposed the use of blockchain technologies to validate personal data compliance [77]. Rather than proposing new software engineering methods, measures, and tools related to GDPR, our work takes an empirical perspective to understand current practices.\n\n7 THREATS TO VALIDITY\n\nWe discuss three types of threats to validity.\n\nConstruct: In mining OSS repositories, we defined the construct of \u201cGDPR-related pull requests\u201d based on the presence of the string \u201cGDPR\u201d. Some PRs may incorrectly refer to GDPR (false positives), while others may perform GDPR-relevant changes without using the acronym (false negatives). This is also biased towards English-speakers, as this acronym differs in other languages. To mitigate non-English GDPR-related PRs polluting the non-GDPR-related dataset, we manually inspected PR titles for various iterations of the GDPR in other languages, including \u201cRGPD\u201d (French, Spanish, and Italian), \u201cDSGVO\u201d (German), and \u201cAVG\u201d (Dutch). However, these were not included in our GDPR-related dataset since we only focus on PRs in English for our analysis. We used off-the-shelf NLP techniques to assess sentiment, inheriting biases from these methods (e.g., misinterpreted connotations of homonyms such as \u201cmock\u201d). In addition, parametric models for sentiment analysis are based on defined dictionary values and cannot detect certain aspects of human communication, such as sarcasm. Prior work also suggests sentiment analysis tools can be inaccurate in software engineering contexts [64]. However, we use this to gain preliminary insights into developers\u2019 perceptions of GDPR compliance in OSS.\n\nInternal: We perceive no internal threats. This study provides characterizations rather than cause-effect measurements.\n\nExternal: There are several threats to the generalizability of our findings. We inherit the standard perils of mining open-source software [65]. We focus on open-source software available on GitHub, which omits other code hosting platforms, such as GitLab, which may be used by different populations of developers. We doubt our results generalize to commercial software, since those development organizations directly face the consequences of GDPR non-compliance. We only consider the effect of GDPR because it is the most prominent privacy law, and hence has the most available data. Other regulations may have different effects. Specifically, we conjecture differences in the software engineering impact between general data privacy regulations, such as the GDPR and CCPA,\nand industry-specific data privacy regulations, such as HIPAA and FERPA: general regulations may necessarily be more ambiguous.\n\n8 CONCLUSIONS\n\nData privacy regulations are being introduced to prevent data controllers from misusing users\u2019 information and to protect individuals. To adhere with these regulations, developers are charged with the complex task of understanding policies and making modifications to the source code of applications to implement privacy-related requirements. This work examines the impact of data privacy regulations on software development processes by investigating code contributions and developer perceptions of GDPR compliance in open-source software. Our results show that complying with data privacy regulations significantly impacts development activities on GitHub, evoking negative perceptions and frustrations from developers. Our findings provide implications for developers and policymakers to support the implementation of data privacy regulations that protect the rights of human users in digital environments.\n\n9 DATA AVAILABILITY\n\nWe have uploaded the survey, datasets, and data collection and analysis scripts as supplementary materials [2]. Our IRB protocol does not allow us to share individual survey responses.\n\n10 ACKNOWLEDGMENTS\n\nBrown and Brantly acknowledge support from the Virginia Commonwealth Cyber Initiative (CCI).\nREFERENCES\n\n[1] [n. d.]. https://edps.europa.eu/data-protection/data-protection/legislation/its-\ntory-general-data-protection-regulation_en\n\n[2] [n. d.]. https://anonymous.4open.science/r/GDPR-OSS-Impact-D77B\n\n[3] [n. d.]. MIT License. https://opensource.org/licenses/MIT. Accessed. July 2023.\n\n[4] [n. d.]. Right to erasure (\u2018right to be forgotten\u2019). https://gdpr-info.eu/art-17-\ngdpr/\n\n[5] 1974. Family Educational Rights and Privacy Act of 1974. 20 U.S.C. \u00a7 1232g; 34\nCFR Part 99. https://www2.ed.gov/policy/gen/guid/epco/erpa/index.html\n\n[6] 1991. GNU General Public License, version 2. Free Software Foundation.\nhttps://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html\n\n[7] 1996. Health Insurance Portability and Accountability Act of 1996. Pub. L.\nNo. 104-191, 110 Stat. 1936. https://www.govinfo.gov/content/pkg/PLAW-\n104publ191/pdf/PLAW-104publ191.pdf\n\n[8] 2004. Apache License, Version 2.0. Apache Software Foundation. https:\n//www.apache.org/licenses/LICENSE-2.0\n\n[9] 2010 IEC 61508-1:2010 - Functional safety of electro-\n\ncal/electronic/programmable electronic safety-related systems \u2013 Part 1:\nGeneral requirements. International Electrotechnical Commission.\nhttps://webstore.iec.ch/publication/5512\n\n[10] 2014. ISO 90003:2014 - Software engineering \u2013 Guidelines for the applica-\ntion of ISO 9001:2015 to computer software. International Organization for\nStandardization. https://www.iso.org/standard/59149.html\n\n[11] 2015. ISO 9001:2015 - Quality management systems \u2013 Requirements.\nInternational Organization for Standardization. https://www.iso.org/standard/62085.h\n\n[12] 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council\nof 27 April 2016 on the protection of natural persons with regard to the processing\nof personal data and on the free movement of such data, and repealing Directive\n95/46/EC (General Data Protection Regulation). Official Journal of the European\nUnion. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:\n95/46/EC\n\n[13] 2018. ISO 26262-1:2018 - Road vehicles \u2013 Functional safety \u2013 Part 1: Vocabulary.\nInternational Organization for Standardization. https://www.iso.org/standard\n/68383.html\n\n[14] 2023. 5th State of CCPA & GDPR Privacy Rights Compliance Research Report\n\u2013 Q4 2022. Cytrio. https://cytrio.com/wp-content/uploads/2023/02/5th-State-\nof-CCPA-GDPR-Compliance-Report_FNL2.pdf\n\n[15] 2023. GDPR Enforcement Tracker \u2013 list of GDPR fines. Enforcement Tracker.\nhttps://www.enforcementtracker.com\n\n[16] Ahmad Abdellatif, Mairieli Wessel, Igor Steinmacher, et al. 2022. BotHunter: an\napproach to detect software bots in GitHub. In Proceedings of the 19th Interna-\ntional Conference on Mining Software Repositories. 6\u201317.\n\n[17] Abdel-Jaouda Aberkane, Geert Poels, and Seppe Vanden Broucke. 2021. Ex-\nploring automated gdpr-compliance in requirements engineering: A systematic\nmapping study. IEEE Access 9 (2021), 66542\u201366559.\n\n[18] Saeed Akhlaghpour, Farkhondeh Hassandoust, et al. 2021. Learning from\nenforcement cases to manage gdpr risks. MIS Quarterly Executive 20, 3 (2021).\n\n[19] Yaqoob Al-Slais. 2020. Privacy Engineering Methodologies: A survey. In 2020 In-\nternational Conference on Innovation and Intelligence for Informatics, Computing\nand Technologies (SICT). 1\u20136. https://doi.org/10.1109/3ICT51146.2020.9311949\n\n[20] Abdulrahman Alhazmi and Nalin Asanka Arachchilage. 2021. I\u2019m all ears!\nlistening to software developers on putting GDPR principles into software\ndevelopment practice. Personal and Ubiquitous Computing, 25, 5 (2021), 879\u2013892.\n\n[21] Reni Allan. 2007. Reskilling for compliance. Inf. Professional 4, 1 (2007), 20\u201323.\n\n[22] Fernando Almeida and Jos\u00e9 Augusto Monteiro. 2021. Exploring the effects of\nGDPR on the user experience. Journal of information systems engineering and\nmanagement 6, 3 (2021).\n\n[23] Murugan Anandarajan, Chelsey Hill, Thomas Nolan, Murugan Anandarajan,\nChelsey Hill, and Thomas Nolan. 2019. Text preprocessing. Practical text\nanalytics: Maximizing the value of text data (2019), 45\u201359.\n\n[24] Maythee Anegboonlap. 2018. Will this conflict with GDPR? https://github.com\n/ReferaCandy/woocommerce-refera-candy/pull/24#discussion_r2381535\n46. Github repository: ReferaCandy/woocommerce-refera-candy.\n\n[25] Pauline Anthonysamy, Awais Rashid, and Ruzanna Chitchyan. 2017. Privacy re-\nquirements: present & future. In IEEE/ACM International Conference on Software\nEngineering: Software Engineering in Society (ICSE-SEIS). IEEE, 13\u201322.\n\n[26] Johannes Bader, Jonathan Aldrich, and \u00c9ric Tanter. 2018. Gradual program\nverification. In Verification, Model Checking, and Abstract Interpretation (VMCAI).\nSpringer, 25\u201346.\n\n[27] Ben Balter. 2015. Open source license usage on Github.com. Github Blog.\nhttps://github.blog/2015-03-09-open-source-license-usage-on-github-com/\n\n[28] Muneera Bano. 2015. Addressing the challenges of requirements ambiguity:\nA review of empirical literature. In 2015 IEEE Fifth International Workshop on\nEmpirical Requirements Engineering (EmpiRE) IEEE, 21\u201324.\n\n[29] Kathrin Bednar, Sarah Spekermann, and Marc Langheinrich. 2019. Engineering\nPrivacy by Design: Are engineers ready to live up to the challenge? The\nInformation Society 35, 3 (2019), 122\u2013142.\n\n[30] Yoav Benjamini and Yosef Hochberg. 1995. Controlling the False Discovery\nRate: A Practical and Powerful Approach to Multiple Testing. Journal of the\nRoyal Statistical Society. Series B (Methodological) 57, 1 (1995), 289\u2013300. http:\n//www.jstor.org/stable/2346181\n\n[31] Ani Betts. 2021. Just enough EULA to not get banned. https://github.com/anais-\nbets/sirene/pull/37. Github repository: anaisbets/sirene.\n\n[32] Alex Bowyer, Jack Holt, and Johnnie Go Jeffers, Rob Wilson, David Kirk, and\nJan David Smeddinck. 2022. Human-GDPR interaction: Practical experiences\nof accessing personal data. In Proceedings of the 2022 chi conference on human\nfactors in computing systems. 1\u201319.\n\n[33] Randolph E. Bucklin and Catarina Sinimero. 2009. Click here for Internet insight:\nAdvances in clickstream data analysis in marketing. Journal of Interactive\nmarketing 23, 1 (2009), 35\u201348.\n\n[34] Noel Carroll and Ita Richardson. 2016. Software-as-a-medical device: demystify-\ning connected health regulations. Journal of Systems and Information Technology\n18, 2 (2016), 186\u2013215.\n\n[35] Ann Cavoukian. 2009. Privacy by design. (2009).\n\n[36] David Chisnall. 2012. The Go programming language phrasebook. Addison-\nWesley.\n\n[37] Bernard CK Choi, Tikki Pang, Vivian Lin, et al. 2005. Can scientists and policy\nmakers work together? Journal of Epidemiology & Community Health 59, 8\n(2005), 632\u2013637.\n\n[38] Tom Clancy. 1995. The chaos report. The Standish Group (1995).\n\n[39] Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences. Rout-\nledge.\n\n[40] Jose Luis de La Vara, Markus Borg, Krzysztof Wnuk, and Leon Moonen. 2016.\nAn industrial survey of the impact of evidence change impact analysis practice.\nIEEE Transactions on Software Engineering 42, 12 (2016), 1095\u20131117.\n\n[41] J.B. Earp, A.I. Anton, L. Aiman-Smith, and W.H. Stufflebeam. 2005. Examining\nInternet privacy policies within the context of user privacy values. IEEE\nTransactions on Engineering Management 52, 2 (2005), 227\u2013237.\n\n[42] Pietro Ferrara, Nicola Fausto Spoto, et al. 2018. Static analysis for GDPR com-\npliance. In CEUR Workshop Proceedings. CEUR Workshop Proceedings, 1\u201310.\n\n[43] Aaron J Fischer, Brandon K Schultz, Melissa A Collier-Meek, et al. 2018. A\ncritical review of videoconferencing software to support school consultation.\nInternational Journal of School & Educational Psychology 6, 1 (2018), 12\u201322.\n\n[44] Lucas Franke, Huayu Liang, Aaron Brantly, James C. Davis, and Chris Brown.\n2024. A First Look at the General Data Protection Regulation (GDPR) in Open-\nSource Software. In Proceedings of the 2024 IEEE/ACM 46th International Confer-\nence on Software Engineering: Companion Proceedings (Lisbon, Portugal)\n(ICESE Companion \u201924). Association for Computing Machinery, New York, NY, USA,\n268\u2013269. https://doi.org/10.1145/3639478.3643077\n\n[45] GDPR. 2018. Art. 4 GDPR: Definitions. https://gdpr.eu/article-4-definitions/\n\n[46] GDPR. 2018. Art. 83 GDPR: General conditions for imposing administrative fines.\nhttps://gdpr.eu/article-83-conditions-for-imposing-administrative-fines/\n\n[47] Github. 2022. Octoverse 2022: The state of open source software. https:\n//octoverse.github.com\n\n[48] Github. 2023. Creating a pull request. https://help.github.com/en/articles/crea-\nting-a-pull-request. Github Help.\n\n[49] Georgios Gousios and Andy Zaidman. 2014. A dataset for pull-based develop-\nment research. In Conference on Mining Software Repositories. 368\u2013371.\n\n[50] Emiza Guzman, David Aziozar, and Yang Li. 2014. Sentiment analysis of commit\ncomments in Github: an empirical study. In Mining Software Repositories (MSR)\n\n[51] Rajaa El Hamdani et al. 2021. A combined rule-based and machine learning\napproach for automated GDPR compliance checking. In Eighteenth International\nConference on Artificial Intelligence and Law 40\u201349.\n\n[52] Nikolay Harutyunyan. 2020. Managing your open source supply chain-why\nand how? Computer 53, 6 (2020), 77\u201381.\n\n[53] Paul Hitlin, Rainie Lee, and Kenneth Olmstead. 2019. Facebook Algorithms and\nPersonal Data. Pew Research Center. https://www.pewresearch.org/internet/2\n019/01/16/facebook-algorithms-and-personal-data/\n\n[54] Chris Hobbs. 2019. Embedded software development for safety-critical systems\nCRC Press.\n\n[55] Sebastian Holst. 2017. GDPR liability: software development and the new law.\nLinkedIn (2017). https://www.linkedin.com/pulse/gdpr-liability-software-\ndevelopment-new-law-sebastian-holst/\n\n[56] Mingyu Hu and Bing Liu. 2004. Mining opinion features in customer reviews.\nIn AAAI. Vol. 4. 755\u2013760.\n\n[57] Syed Fatiul Huq, Ali Zafar Sadiq, and Kazi Sakib. 2019. Understanding the\neffect of developer sentiment on fix-inducing changes: An exploratory study\non github pull requests. In 2019 26th Asia-Pacific Software Engineering Conference\n(APSEC) IEEE, 514\u2013521.\n\n[58] Clayton Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model\nfor sentiment analysis of social media text. In Proceedings of the international\nAAAI conference on web and social media, Vol. 8. 216\u2013225.\n\n[59] International Association of Privacy Professionals. Accessed 2023. Global\nComprehensive Privacy Law Mapping Chart. https://iappr.org/resources/article/glo-\nbal-comprehensive-privacy-law-mapping-chart/\n[60] International Electrotechnical Commission. 2010. Functional safety of electrical/ electronic/programmable electrical safety-related systems - Part 3: Software requirements. https://webstore.iec.ch/publication/9277\n\n[61] Chongtao Jia, Mihai St\u0103nescu, and Elham Marin. 2019. How can researchers facilitate the utilisation of research by policy-makers and practitioners in education? Research Papers in Education 34, 4 (2019), 483\u2013498.\n\n[62] Onnisaak and M. Henna. 2018. User Data Privacy: Facebook, Cambridge Analytica, and Privacy Protection. Computer 51, 8 (2018), 56\u201359.\n\n[63] Arthur M. Jacobs. 2019. Sentiment analysis for words and fiction characters from the perspective of computational (neuro-) poetics. Frontiers in Robotics and AI 6 (2019), 53.\n\n[64] Robbert Jongeling, Proshanta Sarkar, Subhajit Datta, and Alexander Serebrenik. 2017. On negative results when using sentiment analysis tools for software engineering research. Empirical Software Engineering 22 (2017), 2543\u20132584.\n\n[65] Eirini Kallianvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2014. The promises and perils of mining github. In 11th Working Conference on Mining Software Repositories (MSR). 92\u2013101.\n\n[66] Oleksandra Klymenko, Oleksandr Kosenkov, Stephen Meisenbacher, Parisa Elahidoost, Daniel Mendez, and Florian Matthes. 2022. Understanding the implementation of technical measures in the process of data privacy compliance: A qualitative study. In Proceedings of the 16th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 261\u2013271.\n\n[67] Michael Kretschmer, Jan Pennekamp, and Klaus Weber. 2021. Cookie banners and privacy policies: measuring the impact of the gdpr on the web. ACM Transactions on the Web (TWEB) 15, 4 (2021), 1\u201342.\n\n[68] Oksana Kulyk, Nina Gerber, Annika Hilt, et al. 2020. Has the gdpr hype affected users\u2019 reaction to cookie disclaimers? Journal of Cybersecurity - 1. 88\u201395.\n\n[69] Aman Kumar, Manish Khare, and Saurabh Tiwari. 2022. Sensitivity Analysis of Developers\u2019 Comments on GitHub Repository: A Study. In International Conference on Advanced Computational Intelligence (ICACI). IEEE, 91\u201398.\n\n[70] Christian Kurtz, Martin Semmann, and Tilo Bohman. 2018. Privacy by design to comply with GDPR: a review on third-party data processors. (2018).\n\n[71] Dani\u00ebl Lakens. 2013. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in psychology 4 (2013), 6267.\n\n[72] Roslyn Layton and Silvia Elaluf-Calderwood. 2019. A social economic analysis of the impact of GDPR on security and privacy practices. In 2019 12th CMI Conference on Cybersecurity and Privacy (CMI). IEEE, 1\u20136.\n\n[73] Thomas W MacFarland, Jan M Yates, Thomas W MacFarland, and Jan M Yates. 2016. Mann\u2013whitney u test. Introduction to nonparametric statistics for the biological sciences using R (2016), 103\u2013132.\n\n[74] Abhishek Mahindrakar and Karuna Pande Joshi. 2020. Automating GDPR Compliance using Policy Integrated Blockchain. In IEEE Intl Conf on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conf on High Performance and Smart Computing (HPSC) and IEEE Intl Conf on Intelligent Data and Security (IDS). 86\u201393. https://doi.org/10.1109/BigDataSecurity-HPSC-IDS49724.2020.00026\n\n[75] MH Lloyd and PJ Reeve. 2009. IEC 61508 and IEC 61511 assessments-some lessons learned. (2009).\n\n[76] Thomas W MacFarland, Jan M Yates, Thomas W MacFarland, and Jan M Yates. 2018. 2023\u20132026. The impact of GDPR on global technology development. Journal of Global Information Technology Management 22, 1 (2019).\n\n[77] Ze Shi Li, Colin Werner, and Neil Ernst. 2019. Continuous Requirements: An Example Using GDPR. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW). 144\u2013149. https://doi.org/10.1109/REW.2019.00031\n\n[78] MH Lloyd and PJ Reeve. 2009. IEC 61508 and IEC 61511 assessments-some lessons learned. (2009).\n\n[79] Thomas W MacFarland, Jan M Yates, Thomas W MacFarland, and Jan M Yates. 2016. Mann\u2013whitney u test. Introduction to nonparametric statistics for the biological sciences using R (2016), 103\u2013132.\n\n[80] Abhishek Mahindrakar and Karuna Pande Joshi. 2020. Automating GDPR Compliance using Policy Integrated Blockchain. In IEEE Intl Conf on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conf on High Performance and Smart Computing (HPSC) and IEEE Intl Conf on Intelligent Data and Security (IDS). 86\u201393. https://doi.org/10.1109/BigDataSecurity-HPSC-IDS49724.2020.00026\n\n[81] Yod-Samuel Martad and Anu Kung. 2018. Methods and tools for GDPR compliance through privacy and data protection engineering. In IEEE European Symposium on Security and Privacy\u2013Workshops. IEEE, 108\u2013111.\n\n[82] J M Valdez Mendia and J J A. Flores-Cuatle. 2022. Toward customer hyper-personalization experience \u2014 A Data-driven approach. Cogent Business & Management 9, 1 (2022), 2041384. https://doi.org/10.1080/23311975.2022.2041384\n\n[83] Dan Milmo and Lisa O\u2019Carroll. 2023. Facebook owner Meta fined \u20ac1.2bn for mishandling user information. The Guardian. https://www.theguardian.com/technology/2023/may/22/facebook-fined-mishandling-user-information-ireland-eu-meta\n\n[84] Rene Moquin and Robin L Wakefield. 2016. The roles of awareness, sanctions, and ethics in software compliance. Journal of Computer Info. Sys. 56, 3 (2016).\n\n[85] Frank Nagle, James Dana, Jennifer Hoffman, Steven Randazoo, and Xanou Zhou. 2022. Census II of Free and Open Source Software\u2014Application Libraries. Linux Foundation, Harvard Laboratory for Innovation Science (LISH) and Open Source Security Foundation (OpenSSF) 80 (2022).\n\n[86] Chinenye Okafor et al. 2022. Sok: Analysis of software supply chain security by establishing secure design properties. In ACM SCORED Workshop. 15\u201324.\n\n[87] Kang-il Park and Bonita Sharif. 2021. Assessing perceived sentiment in pull requests with emojis: evidence from tool and developer eye movements. In 2021 IEEE/ACM Sixth International Workshop on Emotion Awareness in Software Engineering (SEmotion). IEEE, 1\u20136.\n\n[88] Luca Pascarella, Davide Spadini, et al. 2018. Information needs in contemporary code review. Proc. of the ACM on Human-Computer Interaction: CSCW (2018).\n\n[89] Cole S Peterson, Jonathan A Saddler, Natalie M Halavick, and Bonita Sharif. 2019. A gaze-based exploratory study on the information seeking behavior of developers on stack overflow. In CI 1\u20136.\n\n[90] Daniel Pletea, Bogdan Vasilescu, and Alexander Serebrenik. 2014. Security and emotion: sentiment analysis of security discussions on github. In Proceedings of the 11th working conference on mining software repositories. 348\u2013351.\n\n[91] Supreeth Shastri et al. 2020. Understanding and benchmarking the impact of GDPR on database systems. VLDB 13, 7 (2020), 1064\u20131077.\n\n[92] Jeremy Sirk and Walid Tabu. 2007. Gradual typing for objects. In European Conference on Object-oriented Programming. Springer, 2\u201327.\n\n[93] Devarshi Singh et al. 2017. Evaluating how static analysis tools can reduce code review effort. In 2017 IEEE symposium on visual languages and human-centric computing (VL/HCC). IEEE, 191\u2013105.\n\n[94] Sean Sirur, Jason R.C. Nurse, and Helena Webb. 2018. Are We There Yet? Understanding the Challenges Faced in Complying with the General Data Protection Regulation (GDPR). In 2nd International Workshop on Multimedia and Security (MMSec). Springer, 1\u201316.\n\n[95] Ian Sommerville. 2011. Software Engineering, 9/E. Pearson Education India.\n\n[96] Jeff South. 2018. More than 1,000 U.S. news sites are still unavailable in Europe, two months after GDPR took effect. Nieman Lab. https://www.niemanlab.org/2018/08/more-than-1000-us-news-sites-are-still-unavailable-in-europe-two-months-after-gdpr-took-effect/\n\n[97] Richard Sproat, Alan W Black, Stanley Chen, Shankar Kumar, Mari Ostendorf, and Christina D Richards. 2001. Normalization of non-standard words. Computer speech & language 15, 3 (2001), 287\u2013333.\n\n[98] David Stokes. 2012. 21 - Validation and regulatory compliance of free/open source software. In Open Source Software in Life Science Research, Lee Harland and Mark Forster (Eds.). Woodhead Publishing, 481\u2013504.\n\n[99] Joanna Stryczew, Jef Audouls, and Natali Helberger. 2020. Data protection or data frustration? Individual perceptions and attitudes towards the GDPR. Eur. Data Prot. L. Rev. 6 (2020), 407.\n\n[100] Synopsys. 2023. Open Source Security and Risk Analysis Report. https://www.pwc.com/us/en/services/consulting/library/gdpr-readiness.html\n\n[101] Aurelia Tam\u00f2-Larrieux and Aurelia Tam\u00f2-Larrieux. 2018. Privacy by Design for the Internet of Things: A Startup Scenario. Designing for Privacy and its Legal Framework: Data Protection by Design and Default for the Internet of Things (2018), 203\u2013226.\n\n[102] Neil Thurman. 2020. Many EU visitors shut out of US sites in response to GDPR never came back. Reuters Institute for the Study of Journalism. https://reutersinstitute.politics.ox.ac.uk/news/many-eu-visitors-shut-out-us-sites-\n\n[103] Serj Tubin. 2023. GDPR stuff. https://github.com/2beens/serj-tubin-vue/pull/71. GitHub repository: 2beens/serj-tubin-vue.\n\n[104] UNCTAD. 2021. Data Protection and Privacy Legislation Worldwide. United Nations Conference on Trade and Development (2021). https://unctad.org/page/data-protection-and-privacy-legislation-worldwide\n\n[105] Christine Utz, Martin Degeling, Sascha Fahl, et al. 2019. (Un) informed consent: Studying GDPR consent notices in the field. In ACM SIGSAC Conference on Computer and Communications Security (CCS). 973\u2013990.\n\n[106] N. van Dijk, A. Tanas, K. Rommetveit, and C. Raab. 2018. Right engineering? the redesign of privacy and Personal Data Protection. International Review of Law, Computers & Technology 32, 2\u20133 (Apr 2018), 230\u2013256. https://doi.org/10.1080/13600069.2014.1575022\n\n[107] Ana Vaz\u00e3o, Leonel Santos, Maria Beatriz Piedade, and Carlos Rabadao. 2019. SIEM open source solutions: a comparative study. In 2019 14th Iberian Conference on Information Systems and Technologies (CISTE). IEEE, 1\u20135.\n\n[108] Denis Verdon. 2006. Security policies and the software developer. IEEE Security & Privacy 4, 4 (2006), 42\u201349.\n\n[109] Branka Vuleta. 2023. 10 unbelievable GDPR statistics in 2023. https://legaljobs.io/blog/gdpr-statistics/\n\n[110] Yue Wang, Irene L Manotas Guti\u00e8rrez, Kristina Winbladh, and Hui Fang. 2013. Automatic detection of ambiguous terminology for software requirements. In 18th International Conference on Applications of Natural Language to Information Retrieval (NAACL-HLT). Association for Computational Linguistics, 75\u201385.\n\n[111] R Kent Weaver. 2015. Getting people to behave: Research lessons for policy makers. Public Administration Review 75, 6 (2015), 806\u2013816.\n\n[112] Krzysztof Wnuk, Tony Gorschek, and Showary Zahda. 2013. Obsolete software requirements. Information and Software Technology 55, 6 (2013), 921\u2013940.\n[114] Christopher Wylie. 2019. How I Helped Hack Democracy. New York Magazine. https://nymag.com/intelligencer/2019/10/book-excerpt-mindf-ck-by-christopher-wylie.html\n\n[115] Christopher Wylie. 2019. I Made Steve Bannon\u2019s Psychological Warfare Tool: Meet the Cambridge Analytica Whistle-blower. New York Magazine. https://nymag.com/intelligencer/2019/10/book-excerpt-mindf-ck-by-christopher-wylie.html\n\n[116] Haoxiang Zhang, Shaowei Wang, Tse-Hsun Chen, Ying Zou, and Ahmed E Hassan. 2019. An empirical study of obsolete answers on stack overflow. IEEE Transactions on Software Engineering 47, 4 (2019), 850\u2013862.", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/006-franke.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 14, "total-input-tokens": 44415, "total-output-tokens": 21585, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 5507, 1], [5507, 12248, 2], [12248, 18500, 3], [18500, 23991, 4], [23991, 30773, 5], [30773, 37523, 6], [37523, 43039, 7], [43039, 49760, 8], [49760, 56735, 9], [56735, 63641, 10], [63641, 65016, 11], [65016, 75412, 12], [75412, 86270, 13], [86270, 86881, 14]]}}
|
|
{"id": "fd75f1a8fa0d835bde71a4db978aabeba0e790a6", "text": "Sustainability of Open Source software communities beyond a fork: How and why has the LibreOffice project evolved?\n\nJonas Gamalielsson*, Bj\u00f6rn Lundell\nUniversity of Sk\u00f6vde, P.O. Box 408, SE-541 28 Sk\u00f6vde, Sweden\n\nARTICLE INFO\nArticle history:\nReceived 19 October 2012\nReceived in revised form 7 November 2013\nAccepted 8 November 2013\nAvailable online 21 November 2013\n\nKeywords:\nOpen Source software\nFork\nCommunity evolution\n\nABSTRACT\nMany organisations are dependent upon long-term sustainable software systems and associated communities. In this paper we consider long-term sustainability of Open Source software communities in Open Source software projects involving a fork. There is currently a lack of studies in the literature that address how specific Open Source software communities are affected by a fork. We report from a study aiming to investigate the developer community around the LibreOffice project, which is a fork from the OpenOffice.org project. In so doing, our analysis also covers the OpenOffice.org project and the related Apache OpenOffice project. The results strongly suggest a long-term sustainable LibreOffice community and that there are no signs of stagnation in the LibreOffice project 33 months after the fork. Our analysis provides details on developer communities for the LibreOffice and Apache OpenOffice projects and specifically concerning how they have evolved from the OpenOffice.org community with respect to project activity, developer commitment, and retention of committers over time. Further, we present results from an analysis of first hand experiences from contributors in the LibreOffice community. Findings from our analysis show that Open Source software communities can outlive Open Source software projects and that LibreOffice is perceived by its community as supportive, diversified, and independent. The study contributes new insights concerning challenges related to long-term sustainability of Open Source software communities.\n\n\u00a9 2013 The Authors. Published by Elsevier Inc. Open access under CC BY license.\n\n1. Introduction\nMany organisations have requirements for long-term sustainable software systems and associated digital assets. Open Source software (OSS) has been identified as a strategy for implementing long-term sustainable software systems (Blondelle et al., 2012a; Lundell et al., 2011; M\u00fcller, 2008). For any OSS project, the sustainability of its communities is fundamental to its long-term success. In this study we consider long-term sustainability of communities in OSS projects involving a fork. Our overarching goal was to establish rich insights concerning how and why the LibreOffice project and associated communities have evolved the LibreOffice project and associated communities have evolved. More specifically, we report on commitment with the LibreOffice project, retention of committers, and insights and experiences from participants in the LibreOffice community. Overall, the study has revealed several key findings. First, the LibreOffice project, which was forked from the OpenOffice.org project, shows no sign of long-term decline. Second, the LibreOffice project has attracted the long-term and most active committers in the OpenOffice.org project. Third, our analysis shows that Open Source software communities can outlive Open Source software projects. Fourth, LibreOffice is perceived by its community as supportive, diversified, and independent.\n\nThe issue of forking OSS projects has been an ongoing issue of debate amongst practitioners and researchers. It has been claimed that \u201cIndeed, the cardinal sin of OSS, that of project forking (whereby a project is divided in two or more streams, each evolving the product in a different direction), is a strong community norm that acts against developer turnover on projects\u201d (Agerfalk and Fitzgerald, 2008). Further, it has been claimed that few forks are successful (Ven and Mannaert, 2008). Therefore, it is perhaps not surprising to see claims for that \u201cthere must be a strong reason for developers to consider switching to a competing project\u201d (Wheeler, 2007). However, it has also been argued that \u201cforking has the capability of serving as an invisible hand of sustainability that helps open source projects to survive extreme events such as commercial acquisitions, as well as ensures that users and developers have the necessary tools to enable change rather than decay\u201d (Nyman et al., 2012). Similarly, Brian Behlendorf, co-founder of Apache Software Foundation, states that the \u201cright to fork means that you don\u2019t have to have any tolerance for dictators, you don\u2019t have to deal with...\npeople who make bad technical decisions \u2013 you can put the future into your own hands, and if you find a group of other people who agree with you, you can create a new project around it\u201d (Severance, 2012). Another argument is that code forking can positively impact on both governance and sustainability of OSS projects at the levels of the software, its community and business ecosystem (Nyman and Lindman, 2013). From this, there is clearly a need for increased knowledge about how OSS communities are affected by a fork.\n\nThere are two specific objectives. For the first objective, we characterise community evolution over time for the LibreOffice project and the related OpenOffice.org and Apache OpenOffice projects. For the second objective, we report on insights and experiences from participants in a community of the branched project LibreOffice in order to explain how and why the project has evolved after the fork from its base project OpenOffice.org.\n\nThe paper makes four novel contributions. First, we establish a characterisation of the LibreOffice project and the related OpenOffice.org, and Apache OpenOffice projects with respect to history, governance, and activity. Second, we present findings regarding developer commitment with the projects under different governance regimes. Third, we present findings regarding retention of committers in the projects under different governance regimes. Fourth, we report on rich insights and experiences from participants in the LibreOffice project with a view to characterise its community and its way of working. In addition, we demonstrate approaches involving metrics for analysing long-term sustainability of communities (with or without forks) in OSS projects, and illustrate their use on different OSS projects.\n\nThere are five reasons which motivate a study on the LibreOffice project. Firstly, LibreOffice is one of few OSS projects which have had an active community for more than 10 years (when including the development in OpenOffice.org), with significant commercial interest. Secondly, there have been tensions within the OpenOffice.org project which finally led to the creation of the Document Foundation and the LibreOffice project (Byfield, 2010; Documentfoundation, 2013a). Thirdly, the project has reached a certain quality in that it has been adopted for professional use in a variety of private and public sector organisations (Lundell, 2011; Lundell and Gamalielsson, 2011). Therefore, its community is likely to attract a certain level of attention from organisations and individuals. Fourthly, previous studies of the base project OpenOffice.org (Ven et al., 2007) and more recent studies of LibreOffice (Gamalielsson and Lundell, 2011) show that there is widespread deployment in many organisations in a number of countries. This in turn imposes significant challenges for a geographically distributed user community. Fifthly, previous results (Gamalielsson and Lundell, 2011, 2012) and anecdotal evidence from an official spokesperson for the LibreOffice project (Nouws, 2011) suggest significant activity in the LibreOffice community. This motivates a more in-depth investigation of how and why the LibreOffice project evolved.\n\nHence, there is a need to extend previous studies on the LibreOffice project and in so doing include investigation of the project which LibreOffice was forked from (the OpenOffice.org project) and also alternative branches (the Apache OpenOffice project). An investigation of the OpenOffice.org project is interesting since it has been widely deployed. Further, the project is a natural source for recruitment to the LibreOffice project. Similarly, Apache OpenOffice is also interesting to investigate since it is the project that succeeded the OpenOffice.org project after Oracle abandoned it. Further, the investigation of Apache OpenOffice enables a more comprehensive study of community dynamics since the OpenOffice.org project is a potential source for recruitment to the Apache OpenOffice project as well.\n\nFor the rest of this paper we position our exploration of sustainability of OSS communities in the broader context of previous research on OSS communities (Section 2). We then clarify our research approach (Section 3), and report on our results (Sections 4 and 5). Thereafter, we analyse our results (Section 6) followed by discussion and conclusions (Section 7).\n\n2. On sustainable Open Source software communities\n\nMany companies need to preserve their systems and associated digital assets for more than 30 years (Lundell et al., 2011), and in some industrial sectors (such as avionics) even more than 70 years (Blondelle et al., 2012b; Robert, 2006). In such usage scenarios \u201cthere will be problems if the commercial vendor of adopted proprietary software leaves the market\u201d with increased risks for long-term availability of both software and digital assets (Lundell et al., 2011). Similarly, for organisations in the public sector, many systems and digital assets need to be maintained for several decades. This causes organisations to vary concerning different types of lock-in and inability to provide long-term maintenance of critical systems and digital assets (Lundell, 2011). For this reason, sustainability of communities has been identified as essential for long-term sustainability of OSS.\n\nThere are many different aspects of an OSS project that can affect community sustainability. Good project management practice includes to consider different incentives for contributing to OSS communities. This in turn may affect the future sustainability of communities (Bonaccorsi and Rossi, 2006). Previous research has shown that there are a number of different kinds of motivations for individuals and firms that have impact on any decision concerning participation in OSS projects. Such motivations are sometimes categorised into economic, social, and technological types of incentives (Bonaccorsi and Rossi, 2006). Earlier research also suggests that an effective structure of governance is a basis for healthy and sustainable OSS communities (de Laat, 2007). In particular, aspects such as clear leadership, congruence in terms of project goals, and good team spirit are of fundamental importance. Moreover, the community manager in an OSS project plays a key role for achieving an effective structure of governance (Michlmayr, 2009). Further, the licensing of OSS may affect the community. It has been claimed that \u201cfair licensing of all contributions adds a strong sense of confidence to the security of the community\u201d (Bacon, 2009). It has also been claimed that the choice of OSS license type \u201ccan positively or negatively influence the growth of your community.\u201d (Engelfriet, 2010) To successfully master the art of establishing a long-term sustainable OSS community is a huge challenge. As in all organisations, there are \u201ctimes in every community when repetition, housekeeping, and conflict play a role in an otherwise enjoyable merry-go-round. When the community begins to see more bureaucracy and repetition than useful and enjoyable contributions, something is wrong.\u201d (Bacon, 2009)\n\nA fork is often a consequence of inadequate OSS project governance. It has been claimed that forks \u201care generally started when a number of developers do not agree with the general direction in which the project is heading\u201d (Ven and Mannaert, 2008). In particular, conflicts within communities can arise due to inadequate working processes, lack of congruence concerning project goals, and unclear (or in other ways inadequate) leadership. There are different views on what is considered an OSS project fork. It has been claimed that in order to be considered a fork, a project should (Robles and Gonzalez-Barahona, 2012): (1) have a new project name, (2) be a branch of the original OSS project, (3) have an infrastructure that is separated from the infrastructure of the original project, e.g. web site, mailing lists/forums, and SCM (Software Configuration Management system), (4) have a new developer community that is disjoint from the community of the original project, and (5) have a different structure of governance. There are\nalso related concepts that are similar to OSS project forking such as (Robles and Gonzalez-Barahona, 2012): cloning (which involves the design of a software system that mimics another system), branching (where source code is duplicated within an SCM, creating parallel threads of development), derivation (which involves the creation of a new software system that is based on an existing system and which is compatible with the existing system), and modding (where existing software is enhanced, typically by enthusiasts, by providing patches and extensions to the existing software). There are different possible outcomes of a fork attempt. Four different categories have been identified by Wheeler (2007): (1) the forked project dies (e.g. libc/glibc), (2) the forked project re-merges with the original project (e.g. gcc/egcs), (3) the original project dies (e.g. XFree86/X.org), and (4) successful branching where both the original and forked project succeeds and typically have separate communities. A possible fifth outcome is that both the original and forked project dies (Robles and Gonzalez-Barahona, 2012).\n\nGovernance is of fundamental importance for sustainability and evolution of an OSS project and its associated communities. Three different phases of governance have been identified by de Laat (2007): (1) \u201cspontaneous\u201d governance, (2) internal governance, and (3) governance towards outside parties. The first phase of governance concerns the situation where the community (including both volunteer and potentially commercial actors) is self-directing without any formal and explicit control or coordination. Given the licensing framework, control and coordination that emerge stem from the degree of contribution by individual members. High performing members of a community may become informal leaders. The second phase is often adopted in larger projects that have existed for a longer time, and involves formal and explicit control and coordination in order to support more effective governance. Different tools are used for this including modularisation of software, assignment of roles to contributors, delegation of decision making, training and indoctrination, formalised infrastructure to support contributors, and leadership style (autocracy/democracy). A third phase of governance became necessary due to an increased external interest in OSS projects from national and international organisations in both the private and public sector. This increased institutionalisation of OSS led to an increased risk of litigation due to software patent infringements. As a solution, initiatives were taken to create legal shells around OSS projects to protect against lawsuits. One way of implementing this is by establishing non-profit foundations (such as the Linux Foundation and the Mozilla Foundation) for the governance of OSS projects.\n\nIn the context of OSS projects, it has been shown that \u201clittle research has been conducted on social processes related to conflict management and team maintenance\u201d (Crowston et al., 2012). There are several open questions related to this, such as \u201cHow is team maintenance created and sustained over time?\u201d (Crowston et al., 2012). Our study is also motivated by the fact that there is a lack of research presenting rich insights from large and widely deployed OSS projects. In particular, there is a need for increased knowledge related to community involvement in projects involving a fork. We also note that there are different, and seemingly conflicting, views amongst practitioners concerning the effect of a fork on involved projects and associated communities. This further motivates our study. For the remainder of this section we position our study with respect to earlier research.\n\nThere are a few studies focusing on forks in an OSS context. However, none of these studies focus on community involvement over time and do not investigate specific OSS projects in-depth. One of these studies focused on motivations for forking SourceForge.net hosted OSS projects (Nyman and Mikkonen, 2011). Another study surveyed a large number of OSS project forks with a specific focus on the temporal evolution of forks, reasons for forking, and outcomes of forks (Robles and Gonzalez-Barahona, 2012). A similar but more limited study focused on the motivations and impact of the fork mechanism in OSS projects (Visser, 2012). Another study has a focus on code maintenance issues in forked projects in the BSD family of operating systems (Ray and Kim, 2012).\n\nFurther, there are studies on the evolution of OSS projects over time, but such studies do not always have a community focus and are not always targeted at specific projects. Examples include a study on the total growth rate of OSS projects (Deshpande and Riehle, 2008), and work on the evolution of social interactions for a large number of projects on SourceForge.net over time (Madey et al., 2004). Another example is a study on survival analysis of OSS projects involving the application of different metrics based on the duration of thousands of projects in the FLOSSMETRICS database (Samoladas et al., 2010). There are also studies which focus on the evolution of software over time for specific OSS projects but which do not consider the community aspect. An example is a study on the Linux kernel based on Lehman\u2019s laws of software evolution, which involved the application of code oriented metrics over time (Israeli and Feitelson, 2010). A similar approach was used in a case study on the evolution of Eclipse (Mens et al., 2008). Further, the growth of FreeBSD and Linux was studied and compared to earlier results on code evolution (Izurieta and Bieman, 2006). Another study on the topic of software evolution proposes a model of the Linux kernel life cycle (Feitelson, 2012).\n\nA somewhat different strand of research involves development and application of different kinds of statistical measures for estimation and prediction of the survivability (Raja and Tretter, 2012; Wang, 2012), success (Crowston et al., 2003, 2006; Lee et al., 2009; Midha and Palvia, 2012; Sen et al., 2012; Subramaniam et al., 2009; Wiggins et al., 2009; Wiggins and Crowston, 2010) and attractiveness (Santos et al., 2013) of OSS projects. Such measures may consider factors related to (Wang, 2012): developer characteristics (e.g. user and developer effort, service quality, leadership and adherence to OSS ideology), software characteristics (e.g. license terms, targeted users, software modularity and quality), and community attributes (e.g. organisational sponsorship, financial support, trust and social network ties). However, forks are usually not explicitly addressed in such research and the focus is more on the overall survivability or success of OSS projects rather than focusing on the behaviour of communities associated with the projects. Further, such research typically use a large selection of projects from different OSS forges for statistical validation of the measures, whereas our study provides an in-depth analysis of a few inter-related OSS projects employing both a quantitative and qualitative approach.\n\nThere are other studies which do have a focus on the evolution of communities for specific OSS projects, but do not address the effects of a fork. For example, case studies have been conducted on the Debian project involving quantitative investigations of evolution of maintainership and volunteer contributions over time (Robles et al., 2005; Michlmayr et al., 2007). Another study involved an investigation of developer community interaction over time for Apache web server, Gnome and KDE using social network analysis (Lopez-Fernandez et al., 2006). A similar study involved the projects Evolution and Mono (Martinez-Romo et al., 2008). Case studies on the Nagios project (Gamalielsson et al., 2010), and Top-Cased & Papyrus projects (Gamalielsson et al., 2011) addressed community sustainability and evolution over time with a special focus on organisational influence. Other research partially focusing on community evolution are early case studies on large and well-known OSS projects including the Linux kernel (Moon and Sproul, 2000), Gnome (German, 2003), Apache web server (Mockus et al., 2002), Mozilla (Mockus et al., 2002), and FreeBSD (Dinh-Trong and Bieman, 2005).\nFurther, there are no earlier reported in-depth studies on any of the three projects (LibreOffice, OpenOffice.org, and Apache OpenOffice) with a focus on the evolution of OSS project communities over time except for our own earlier studies on LibreOffice (Gamalielsson and Lundell, 2011, 2012). In a study on the process of participation in OSS communities, Shibuya and Tamai (2009) compare the communities for the Writer tool in the OpenOffice.org project, MySQL server in the MySQL project, and GTK+ in the GNOME project. This was done using different kinds of project documentation and quantitative data from bug tracking systems and source code repositories. However, this is a very limited study which only partially covers the OpenOffice.org project. There is another study that also has a community focus but from an open user experience design perspective, rather than a community evolution perspective (Bach and Carroll, 2010). Further, there are studies on OpenOffice.org without a community focus. One such study focused on code evolution (Rossi et al., 2009). Specifically, the study explored the relation between code activities, bug fixing activities, and software release dates for five projects including OpenOffice.org. In another study the maintenance process of the OpenOffice.org project was analysed using its defect management and version management systems (Koponen et al., 2006). There are also studies focusing on issues related to migration, adoption, and deployment of OpenOffice.org (Huysmans et al., 2008; Rossi et al., 2006; Ven et al., 2010; Seydel, 2009).\n\n3. Research approach\n\nTo address our first objective (to characterise community evolution over time for the LibreOffice project and the related OpenOffice.org and Apache OpenOffice projects) we undertook an analysis of the LibreOffice project and the related OpenOffice.org and Apache OpenOffice projects. This was done through a review of documented project information and a quantitative analysis of project repository data in order to investigate the sustainability in OSS communities. This included analysis of different project phases under different governance regimes. For the OpenOffice.org project this encompassed both the time period with governance by Sun Microsystems and by Oracle. For the rest of this paper we refer to the three projects as OO (OpenOffice.org), LO (LibreOffice), and AOO (Apache OpenOffice). OO with governance by Sun Microsystems is hereafter referred to as SOO, and OO with governance by Oracle is hereafter referred to as AOO.\n\nTo contextualise insights from the LibreOffice project, we undertook an analysis of data from a number of different sources. First, we established a characterisation of the three projects (LO, OO and AOO) by undertaking an analysis of: the history and governance of the projects, the release history, and commits to the SCM and contributing committers over time. Second, to investigate developer commitment with the projects we used different metrics that consider to what extent committers have been involved in and contributed to the different projects under different governance regimes. Third, to investigate retention of committers in the projects under different governance regimes we used different metrics that consider: the recruitment of committers over time, the retirement of committers over time, the distribution of commits for committers contributing to different combinations of projects, and the temporal commitment patterns between projects for committers.\n\nIn our quantitative analysis we adopt and extend approaches from earlier studies (Gamalielsson et al., 2011; Gamalielsson and Lundell, 2011, 2012). This is done in order to analyse the contributions in terms of committed SCM artefacts of the OSS projects over time. SCM data was collected from the official repositories for LO and AOO, and for OO from a website recommended at the AOO website which keeps the legacy source code. The data for the LO project was collected from the LO website,1 where the Git sub-repositories \u201ccore\u201d, \u201cbinfilter\u201d, \u201cdictionaries\u201d, \u201ctranslations\u201d and \u201chelp\u201d were used in the analysis. The choice of sub-repositories was done after having a personal dialogue with key LO contributors. For the OO project, data was collected from an archive website,2 where the Mercurial repository was used in the analysis. Data for the AOO project was collected from the AOO website,3 where the SVN repository was used in the analysis. Data until 31 May 2013 were used for LO and AOO, and data until the end of the OO project (April 2011) were used. Logs for all projects were extracted from the repositories and these were thereafter analysed using custom made scripts. Further, a semi-automated approach involving manual inspection was used to associate commit id aliases with the same actual committer.\n\nTo address our second objective (to report on insights and experiences from participants in a community of the branched project LibreOffice in order to explain how and why the project has evolved after the fork from its base project OpenOffice.org), we undertook a case study on the LO project in order to investigate experiences from participants in the project with a view to gain insights from the effects of the fork that led to the establishment of the LO project.\n\nIn order to analyse insights and experiences concerning participation in the LO project, the two researchers conducted interviews with active participants in the LO community. As our goal was to specifically identify incentives and motivations for creation of the LO project, our strategy for identifying potential interviewees was to include key informants in key roles and interviewees with long experience from the project. In addition, we also sought to include interviewees with less experience who joined the project after the fork as a strategy to include additional perspectives. Interviewees were selected on the basis of being actively involved in the LO project.\n\nData collection was based on the results of face-to-face interviews conducted in English. Interviews were recorded, transcribed, and vetted by each interviewee. Questions were prepared in advance, and shown to the interviewee before the conduction of the interview. Each interview was conducted in an informal setting and allowed each interviewee to extensively elaborate on all issues covered during the interview. A total of 12 interviews were conducted, ranging in time from 8 to 43 min and resulting in 67 pages of transcribed and vetted interview data.4 In this process each interviewee was allowed to further elaborate and clarify their responses.\n\nAnalysis of the transcribed interview data took place over an extended time-period to allow time for reflection. Individual analysis was supplemented by group sessions in which researchers discussed and reflected on the interpretations from each researcher.\n\nThe coding of interview data was conducted in a manner which follows Glaser\u2019s ideas on open coding (Lings and Lundell, 2005). The unit of coding was sentences or paragraphs within interview notes. The focus was on constant comparison: indicator to indicator, indicator to emerging concepts and categories (Lings and Lundell, 2005). The goal of the analysis was to develop and refine abstract concepts, which are grounded in data from the field (as interpreted via collected data in the transcriptions). The coding process resulted in a set of categories, each presented as a subsection in Section 5 of this paper.\n\n---\n\n1 http://www.libreoffice.org/developers-2/, accessed 18 June 2013.\n2 http://hg.services.openoffice.org/DEV300, accessed 18 June 2013.\n3 http://incubator.apache.org/openofficeorg/source.html, accessed 18 June 2013.\n4 All interviews were conducted during February 2012.\n4. Community evolution over time\n\nIn this section we report on results related to the first objective. Table 1 presents the main results from our observations concerning community evolution over time as reported in the following sections.\n\n4.1. Characterisation of projects\n\nIn this section we present an overarching characterisation of the three projects. For each project we provide an historical overview, describe its governance, and report on project activity.\n\n4.1.1. Organisations and overview of projects\n\nThe OO project was established as an OSS project on 13 October 2000 (Openoffice, 2004). Its initial release was on 1 October 2001 and the first stable version (v1.0) was released on 30 April 2002 (Openoffice, 2002). Initial development begun within StarDivision, a German based company that was acquired by Sun Microsystems in mid-1999 (Crn, 1999). Before establishing OO, development and provision of the code base was closed source. OO was governed by its community council, which comprised OO community members who also created a charter for the establishment of the council (Openoffice, 2013). The Sun contributor agreement needed to be signed by developers wishing to contribute, whereby the contributions are jointly owned by the developer and Sun corporation. The Oracle corporation acquired Sun (and thereby also the OO project) on 27 January 2010 (Oracle, 2010). Oracle also used a contributor agreement (almost identical to the Sun contributor agreement) that needed to be signed by developers wishing to contribute to the project. Oracle stopped support for commercial OpenOffice.org on 15 April 2011 (Marketwire, 2011a).\n\nLO is a LGPL-licensed Open Source office productivity tool for creation and editing of digital artefacts in the Open Document Format (ODF), which is its native file format. The Document Foundation (TDF) was established on 28 September 2010 (Linuxuser, 2010) under German jurisdiction. The first beta release of LO was provided on the same date (Pclosmag, 2011). TDF has as its mission to facilitate the evolution of the LO project, which is a fork from the OO project since the date of establishing TDF (Documentfoundation, 2013a). TDF is an independent, meritocratic, self-governing, not-for-profit foundation that evolved from the OO community. It was formally established by members from the OO community in September 2010 and it is supported by a large number of small (and some larger) organisations. It has a steering committee currently consisting of eight members (excluding six deputy members), and there are also four other founding members. Further, there are four official spokespersons. TDF is open to individuals who can and are willing to contribute to its activities and who also agree with the core values of the foundation. Organisational participation is also encouraged, for example by supporting individuals financially to work and contribute in the community. TDF commits itself to give \u201ceveryone access to office productivity tools free of charge to enable them to participate as full citizens in the 21st century\u201d (Documentfoundation, 2013b). Further, TDF supports the preservation of mother tongues by encouraging the translation, documentation and promotion of TDF facilitated office productivity tools in the languages of individual contributors. Moreover, TDF commits to allow users to create and maintain their digital artefacts in open document formats based on open standards. In addition, TDF openly seeks voluntary financial contributions (donations) via the project web site for individuals and organisations that want to support the further evolution of the LO project and TDF. Besides from having strong support from volunteer contributors, LO is also receiving support from commercial companies including RedHat, Novell and Canonical (Documentfoundation, 2013c).\n\nOracle donated the OO project to the Apache Software Foundation (ASF) on 1 June 2011 (Marketwire, 2011b). The project was thereafter established as an (incubating) ASF project on 13 June 2011 after undergoing a proposal and voting process (Apache, 2013a). The new project was in connection with this given the name Apache OpenOffice. AOO is licensed under APL v2 and comprises six office productivity applications. The first stable release of AOO (v3.4) was provided on 8 May 2012 (Openoffice, 2012). Apache OpenOffice became a top-level Apache project on 17 October 2012 (Apache, 2013a). ASF was established on 1 June 1999 under U.S. Jurisdiction (Apache, 1999). The mission of ASF is to establish projects delivering freely available and enterprise-grade products that are of interest for large user communities (Apache, 2013b). Apart from AOO, ASF maintains other well-known projects such as HTTP Server, Struts, Subversion, and Tomcat. Like TDF, ASF is an independent, meritocratic, self-governing, and not-for-profit organisation that is governed by the community members that collaborate within ASF projects since 1999. ASF has a board of directors that is annually elected by members of ASF, and which manages the internal organisational affairs of the foundation according to the ASF bylaws. The board consists of nine individuals, and in turn appoints a set of officers whose task is to take care of the daily operation of the foundation. The decision making in individual ASF projects regarding content and direction is delegated by the board of directors to so called project management committees. Each of these committees can govern one or several project communities. Individuals (unaffiliated and working with companies) that are willing and capable of contributing to ASF projects are welcome to participate. Further, ASF accepts donations and has a sponsorship program for individuals and organisations willing to contribute financially. We also note that IBM is an active supporter and contributor to the AOO project (IBM, 2011). Finally, we note that long before the establishment of the AOO project, researchers indicated that leadership and control in the OO project under Sun governance \u201cis remarkably similar to that of Apache\u201d (Conlon, 2007).\n\nFig. 1 summarises the evolution of the projects (OO, LO, and AOO) over time, and includes selected major events related to each project. Moreover, it illustrates how OO (black upper bar), LO (dark grey middle bar), and AOO (light grey lower bar) are interrelated and overlap in time.\n\n4.1.2. Project activity\n\nThe version history of OO, LO and AOO is shown in Table 2. It can be observed that there has been a continuous flow of new OO releases for more than 10 years. On 25 January 2011 the Document Foundation (TDF) announced the first stable version of LO,\nwhich constitutes a fork from OO (Documentfoundation, 2013a). TDF has thereafter regularly provided new releases of LO. Further, the first stable version of AOO was announced on 8 May 2012, which replaced the discontinued OO project.\n\nThe developer activity in OO, LO and AOO is presented in Fig. 2, which shows the number of commits for each month from September 2000 to May 2013. We note that activity in the OO project varies, with distinct peaks in connection with the OO 2.0 (September 2005) and OO 2.4 (March 2008) releases. It can also be observed that the activity level decreased dramatically around August 2008, which is just before the release of OO version 3.0. A contributing reason for this significant drop in activity may be that major changes in terms of features had been implemented for version 3 and that subsequent activity was more focused on bug fixing. We can also observe that the activity in LO and AOO varies over time, but with peaks less distinct than those observed for OO.\n\nFig. 3 illustrates the number of active committers during each month of the projects. It can be observed that there are a large number of committers active early in the OO project, and that the activity decreases considerably shortly after the release of the first stable version of OO (version 1.0) in May 2002. The number of committers increases to a higher level after the release of OO 3.1 in May 2009. We note that there is a discord between number of monthly commits and committers in OO in the interval between January 2003 and January 2009 in that there are relatively few monthly committers contributing a large number of monthly commits. This may be explained by the fact that there are a number of both first and second level releases in the interval, which often co-occur with an elevated level of commits. Further, few committers often provide the majority of commits in OSS projects (see Section 4.2 for more details concerning commitment with the projects). For LO, it can be noted that committer participation peaks significantly in October 2010 and during the subsequent months in connection with the fork from OO. LO participation also peaks in connection with the release of version 4.0 in February 2013. It can also be observed that there was a rise in committer participation in AOO until September 2012.\n\n### 4.2. Commitment with the projects\n\nIn this section we report on the commitment with the projects in terms of SCM contributions. Fig. 4 provides an overview of the commitment with the projects. The figure illustrates the number of committers that have contributed to the seven possible (mutually exclusive) combinations of the three projects. The area of a combination reflects the number of committers and the colour of a combination represents the average number of commits per committer in all projects for the combination. Totally, there have been 795 unique code contributors, who have been active in at least one of the three projects (the sum of committers in all areas). The main observation in Fig. 4 is that the 67 contributors who have committed to both OO and LO have provided the overwhelming majority of the commits (4339 commits per committer). Those committers constitute the backbone in the developer communities of both OO and LO. Further, the 8 contributors to all three projects have provided a substantial amount of commits (1329 commits per committer). Contributors in all other project combinations have had a very limited impact with respect to number of commits (127 commits per committer or less).\n\nTable 3 provides a more detailed picture of commitment to the separate projects for the combinations illustrated in Fig. 4. The table shows the proportion of committers that have contributed to the seven possible combinations of the three projects. The table also shows (in brackets) the number of commits that the committers in the different project combinations contribute in the different projects. It can be observed that the 67 contributors who have committed to both OO and LO have provided the majority of the commits in both OO (92%) and LO (56.4%). Further, the 133\n\n| Table 2 | Version history of OpenOffice.org (OO), LibreOffice (LO), and Apache OpenOffice (AOO). |\n|---------|------------------------------------------------------------------------------------------|\n| | OO | LO | AOO | Date (YYYY-MM-DD) |\n|---------|----|----|-----|-------------------|\n| OO initial | 2001-10-01 |\n| OO 1.0 | 2002-04-30 |\n| OO 1.1 | 2003-09-02 |\n| OO 2.0 | 2005-10-20 |\n| OO 2.1 | 2006-12-12 |\n| OO 2.2 | 2007-03-28 |\n| OO 2.3 | 2007-09-17 |\n| OO 2.4 | 2008-03-27 |\n| OO 3.0 | 2008-10-13 |\n| OO 3.1 | 2009-05-07 |\n| OO 3.2 | 2010-02-11 |\n| LO 3.3 B1 | 2010-09-28 |\n| LO 3.3 | 2011-01-25 |\n| LO 3.4 | 2011-04-12 |\n| LO 3.5 | 2011-06-03 |\n| LO 3.6 | 2012-02-14 |\n| LO 3.7 | 2012-05-08 |\n| LO 3.8 | 2012-08-12 |\n| LO 4.0 | 2013-02-14 |\n| AOO 3.4 | 2013-07-23 |\n| AOO 4.0 | 2013-07-25 |\nFig. 2. Number of monthly commits for the OpenOffice.org (black), LibreOffice (dark grey) and Apache OpenOffice (light grey) projects.\n\nFig. 3. Number of monthly committers for the OpenOffice.org (black), LibreOffice (dark grey) and Apache OpenOffice (light grey) projects.\n\nTable 3\nProportion of commits for committers contributing to different combinations of projects (number of commits in brackets).\n\n| Project combination | LO prop. [%] | AOO prop. [%] | OO prop. [%] |\n|---------------------|--------------|---------------|--------------|\n| LO | 37.2 (23,846) | \u2013 | \u2013 |\n| AOO | \u2013 | 26.4 (939) | \u2013 |\n| OO | \u2013 | \u2013 | 6.1 (16,867) |\n| LO & AOO | 0.3 (170) | 32.1 (1140) | \u2013 |\n| LO & OO | 56.4 (36,152)| \u2013 | 92.0 (254,745)|\n| AOO & OO | \u2013 | 0.2 (8) | <0.1 (1) |\n| LO, AOO & OO | 6.1 (3914) | 41.3 (1466) | 1.9 (5121) |\ncommitters only participating in OO have provided only 6.1% of all commits. This is in contrast with the situation where committers have contributed either only to LO or only to AOO. In these two cases the contributions constitute 37.2%, and 26.4% of all commits, respectively. Further, we note that the 17 committers who have contributed to both LO and AOO (but not OO) have contributed significantly to AOO (32.1%) but very little to LO (0.3%). It may also be considered surprising that only one of the AOO committers has participated in both OO and AOO (but not LO). It is perhaps also unexpected that committers contributing to all three projects are behind 41.3% of all commits in AOO.\n\nFurther, as earlier mentioned, commits have been contributed to the projects under different governance regimes, which have different lengths (SOO: 112 months, OOO: 16 months, LO: 33 months, and AOO: 24 months). Of the 209 committers in OO, 197 committers have been active during the Sun governance of the OO project and contributed 267,011 commits. Further, 81 committers have contributed 9723 commits during the Oracle governance of the OO project.\n\nFig. 5 illustrates the proportion of all commits as a function of proportion of committers for SOO (solid black trace), OOO (dashed black trace), LO (dark grey trace), and AOO (light grey trace). It can, for example, be noted that for SOO and LO, 10% of the committers (19 and 64, respectively) contribute 90.5% (241,645) and 88.8% (56,905) of the commits. Further, the same proportion of committers in OOO and AOO (8 and 4 committers, respectively) contribute 41.6% (4045) and 54.1% (1922) of the commits, respectively. Hence, for SOO and LO, a relatively small proportion of committers contribute the majority of the commits, whereas a larger proportion of the committers in OOO and AOO contribute the majority of the commits. It should also be mentioned that a large proportion of all committers contribute only a few commits (5 commits or less are made by 21.3% of the SOO committers, 12.3% of the OOO committers, 54.3% of the LO committers, and 34.9% of the AOO committers).\n\nTable 4, which is based on the data illustrated in Fig. 5, shows the proportion of commits for different proportions of committers for SOO, OOO, LO, and AOO. Similarly, Table 5 shows the proportion of commits for the top N committers in the projects for different values of N. For example, 5% of the most active LO committers contribute 78% of all LO commits. It can be observed that the proportion of commits for LO in Table 5 is significantly smaller compared to the proportion of commits for LO in Table 4. This is due to the fact that there are many committers (645) in LO and that the top 5 committers therefore are much fewer than the top 5% of committers. For AOO it is the other way around: the top 5 committers is a much greater proportion of committers than the top 5%, and therefore the proportion of commits for AOO is greater in Table 5.\n\n### 4.3. Retention of committers\n\nIn this section we report on the retention of committers for the different projects. Fig. 6 shows the recruitment of committers, retirement of committers and the current number of active committers in the projects for each project month. Recruitment is represented by the accumulated number of committers who have made their first commit (solid black trace). Retirement is represented by the accumulated number of committers who have made their last commit (dashed black trace). The current number of active committers is represented by the difference between the number of recruited and retired committers (grey trace). It can be observed that LO has by far the highest recruitment rate with approximately 20 new committers each month on average. At the same time, LO suffers from a high retirement rate. This is perhaps not surprising since, as earlier mentioned, half of all LO committers only have provided 5 commits or less. However, we cannot observe any long term trend towards a decreased number of active committers. There has roughly been between 100 and 150 currently active committers since the start of the LO project. SOO had a high recruitment rate during the first two years of the project, but a considerably lower recruitment rate during the rest of the project except for the last few months. From having approximately 75 currently active committers on average during the first two years, the SOO stabilised at around 50 currently active committers during the second half of the project. Noticeable about OOO is that recruitment has been slow except for the first few months. Further, the retirement rate in OOO has been comparably high, especially during the later part of the project. This led to a dramatic drop in currently active committers from the 10th project month and onwards. AOO has had a\n\n| Table 4 | Proportion of commits for different proportions of committers in SOO, OOO, LO, and AOO. |\n|---------|---------------------------------|\n| Prop. of committers | SOO | OOO | LO | AOO |\n| Top 5% | 86% | 27% | 78% | 33% |\n| Top 15% | 93% | 52% | 93% | 69% |\n| Top 20% | 95% | 62% | 95% | 80% |\n\n| Table 5 | Proportion of commits for different numbers of committers in SOO, OOO, LO, and AOO. |\n|---------|---------------------------------|\n| Number of committers | SOO | OOO | LO | AOO |\n| Top 5 | 79% | 31% | 33% | 60% |\n| Top 15 | 89% | 59% | 58% | 93% |\n| Top 20 | 91% | 69% | 66% | 96% |\npositive trend in terms of number of active committers during the first 16 project months. This is due to a high recruitment rate and a low retirement rate. However, AOO has lately experienced a stagnation in recruitment and an increasing rate of retirement. This has resulted in a halving of number of active committers in AOO during the second project year. We acknowledge that the total number of project months differ between projects (SOO: 112 months, OOO: 16 months, LO: 33 months, and AOO: 24 months).\n\nThe distribution of commits among committers is further explored in the following in order to better explain commitment with the different projects at committer level. Fig. 7 provides details regarding the distribution of commits in LO (dark grey bar colour) and OO (black bar colour) for the 67 committers only contributing to LO and OO. Committers are sorted on the sum of commits in the two projects (in descending order). As stated earlier in connection with Table 2, the black area represents 92% of all commits in OO and the dark grey area represents 56.4% of all commits in LO. However, the LO commits only comprise 12.4% of all commits in Fig. 7, and the OO commits comprise 87.6%. At the level of individual committers, it can be observed that one of the projects often hugely dominates. For example, the top committer in Fig. 7 contributes 89,931 commits to OO, but only two commits to LO. In fact, the top six committers only contribute 0.4% of all their commits in LO.\n\nSimilarly, Fig. 8 provides details regarding the distribution of commits for the 17 committers contributing only to LO (dark grey bar colour) and AOO (light grey bar colour). The light grey area represents 32.1% of all commits in AOO, and the dark grey area represents 0.1% of all commits in LO. Given these proportions, it is not surprising that the contribution to the different projects is unbalanced. The LO commits only comprise 13% of all commits in Fig. 8, and the AOO commits comprise 87%. The unbalance is clearly visible at the level of individual committers in Fig. 8. For example, committers 3, 4 and 8 contribute a very small proportion of commits to LO. Only committer 10 contributes a larger proportion of commits to LO.\n\nFig. 9 provides details regarding the distribution of commits in LO (dark grey bar colour), AOO (light grey bar colour), and OO (black bar colour) for the 8 committers contributing to all three projects. The black area represents 1.9% of all commits in OO, the light grey area represents 41.3% of all commits in AOO, and the dark grey area represents 6.1% of all commits in LO. Like in Figs. 7 and 8,\nthe contribution to the different projects is somewhat unbalanced. The AOO commits comprise 14% of all commits in Fig. 9, and the LO and OO commits comprise 37.3% and 48.8%, respectively. As an example of unbalance for individual committers, the top committer contributes 2261 commits to LO, but only 69 to AOO. One aspect that can contribute to the unbalance in Figs. 7\u20139 is the fact that projects have different life spans and have accumulated different total amounts of commits. For example, there have been 77 times more commits in OO compared to AOO.\n\nTable 6\nMajor commitment patterns for committers who have contributed to LO.\n\n| Pattern ID | Commitment pattern | Commits | Committers |\n|------------|---------------------|---------|------------|\n| LP1 | | 33642 (52.5%) | 58 (9.0%) |\n| LP2 | | 23846 (37.2%) | 553 (85.7%) |\n| LP3 | | 3052 (4.8%) | 2 (0.3%) |\n| LP4 | | 2385 (3.7%) | 5 (0.8%) |\n\nTables 6 and 7 illustrate the major temporal commitment patterns between projects (OO in black colour, LO in dark grey colour, and AOO in light grey colour) for committers who have contributed to LO (Table 6) and AOO (Table 7). In total, 13 commitment patterns were identified for the 645 LO committers. The four most significant of these patterns (LP1 through LP4) are shown in Table 6. These four patterns account for 98.2% of all LO commits and for 95.8% of all LO committers. Similarly, Table 7 shows the four most significant patterns (AP1 through AP4) out of a total of 10 identified patterns for the 43 AOO committers. These four patterns account for 93.4% of all AOO commits and for 74.4% of all AOO committers. Each committer is assigned to one distinct pattern by comparing the dates of the first and latest commit in the projects the committer has been active in. For example, a committer is assigned to LP1 if commitment in OO and LO has been sequential and the committer has not contributed to AOO. This means that for LP1 the latest commit in OO precedes the first commit in LO. Another example is LP4, where the involvement in OO overlaps with the involvement in LO. Hence, for LP4 the latest commit in OO is after the first commit in LO and the committer has not been active in AOO. In connection with each commitment pattern, Tables 6 and 7 show the number and proportion of commits and committers. The tables are sorted on number of commits assigned to a specific pattern in descending order.\n\nIn Table 6 it is evident that the pattern accounting for the largest amount of LO commits (52.5%) is LP1, where committers have contributed to OO and LO in sequence but not to AOO. There are also other commitment patterns for committers involved in only OO and LO (LP4 and two other patterns not among the four most significant) which together account for 3.9% of the commits. The second most significant pattern in terms of commits (37.2%) is LP2, where committers have contributed only to LO. This pattern applies for the clear majority (85.7%) of the committers. The patterns LP1 and LP2 are clearly dominating and together involve 89.7% of all commits and 94.7% of all committers. It should also (once again) be pointed out that committers who have been involved in OO before their involvement in LO (LP1) contribute a greater proportion of the commits compared to those who only have contributed to LO (LP2).\n\nIn Table 7 it can be observed that the pattern accounting for the largest amount of AOO commits (29.2%) is AP1, where committers have contributed to LO within the period during which they have contributed to AOO. The second most significant pattern in terms of commits (26.4%) is AP2, where committers only have contributed to AOO. When comparing with the LO patterns, we find that there is a more diversified set of commitment patterns that account for significant amounts of commits in AOO. Further, we note that a significant proportion of the AOO commits (41.3%) stem from committers who have previous and in some cases current experience in both OO and LO (AP3, AP4 and another pattern not shown in Table 7).\n\nTo sum up concerning recruitment to LO, 553 of the 645 committers in LO (constituting 85.7%) have not been active in OO or AOO, and have therefore been directly recruited to LO. Further, 75\nof the 645 committers in LO have also contributed to OO. Of these 75 committers, 66 committers have contributed to OO before they started to contribute to LO and have thereafter not contributed to OO, and can therefore be claimed to have been recruited from OO to LO. These 66 committers are influential in that they together have provided the majority of the LO commits (58.7%). The remaining 9 of the 75 committers have been active in LO and OO in parallel. Further, 25 of the 645 committers in LO have also contributed to AOO, but these committers have only contributed to 2.2% of all LO commits.\n\nFor AOO, 17 of the 43 committers (constituting 39.5%) have not been active in OO or LOO, and have therefore been directly recruited to AOO. Further, 8 of the 43 committers in AOO have also contributed to OO before they started to contribute to AOO and have contributed to LO before or during their AOO involvement, and can therefore be claimed to have been recruited from OO and LO. These 8 committers together contribute a significant amount (41.3%) of all AOO commits. We also note that the 17 committers who have only contributed to AOO and LO have mostly contributed to the two projects in parallel and have contributed a considerable amount of the AOO commits (32.1%).\n\n5. Insights and experiences from the LibreOffice community\n\nThis section reports on results related to the second objective. Table 8 shows the main themes for investigation with associated main results from our observations concerning insights and experiences from the LibreOffice community.\n\nAll interviewees are active participants in the LO project, and several of them expressed that they have been active from the start of the project. Our interviewees include participants in the project who were active in the formation of TDF and several have central roles related to the LO project, even though our interviewees also include some contributors with less experience from participation in the project. From this, it would be appropriate to characterise our sample of interviewees as being dominated by experts and thereby to consider our conduction of research interviews as dominated by elite interviews.\n\nSix broad categories emerged from our coding and analysis of the interview transcriptions. Each is presented as a separate section below, with a subheading aimed to characterise the category.\n\n5.1. Considerations for creation of the LibreOffice project\n\nOver time, members of the OO community started to perceive frustration and discontent due to a number of circumstances in the OO project. Concerns amongst community members include perceptions on: vendor dominance, copyright assignment, lack of influence, lack of fun, and bureaucracy in the project. For example, as expressed by a community member: \u201cI started in OpenOffice, and it was fun in the beginning, but over the year you were able to see behind, and I didn\u2019t like what I saw.\u201d Similarly, another community member expressed a view that \u201cit stopped being fun. It stopped being an Open Source project under Oracle\u201d. A different respondent particularly emphasised bureaucracy in the OO project as an inhibitor to contributing: \u201cIn the past, I tried once to get involved in OpenOffice by submitting patches, but that was hell of a job to do, because of all the bureaucracy in the project, so that\u2019s why I didn\u2019t follow up on that and just quit with it.\u201d Overall, the essence of these circumstances seems to originate from a lack of trust.\n\nFrom this, the idea of starting a new branch of the OO project evolved amongst community members. This course of events brought many thoughts among members of the community, as illustrated in a comment raised by one person involved in the creation of LO: \u201cWhen this whole story with Oracle started to look a bit fishy, you just meet people and you start talking, and you start thinking, and then you start planning.\u201d Further, it is clear that a number of issues were considered before taking action, as illustrated by another person involved: \u201cBefore we started we had a lot of discussions. Shall we start? When do we start? How do we start? Which people do we get involved as soon as possible, or a bit later, or whatever?\u201d. Once different issues had been considered it was time to take action, as expressed by a different respondent: \u201cwe founded the LibreOffice project, we got people together to agree and, you know, got the initial structure set up\u201d.\n\nFurther, the choice of a copyleft license7 was mentioned as an important prerequisite for several contributors to the LO project. Hence, there seems to be consensus amongst contributors in the LO\n\n---\n\n7 OSS licenses are often broadly categorised as either copyleft licenses (e.g. GPL, which is a strong copyleft license, and LGPL, which is a weak copyleft license) or permissive licenses (e.g. BSD and MIT). The main difference between these two license categories is that copyleft licenses ensure that derivative work remains open source, whereas permissive licenses do not (Brock, 2013).\nproject that permissive licenses should not be used for the project. As expressed by one respondent: \u201cto me licensing is key and a copyleft license, a weak copyleft license, is pretty much mandatory for me to be interested in the project, because otherwise I know where it\u2019s gonna go, pretty soon we will be writing proprietary software\u201d. The importance of avoiding permissive licensing was further emphasised by another respondent: \u201cthe permissive license would lose half of our volunteer developers, because they are real volunteers. They are in the project for fun. They don\u2019t want to give away their work to a corporation.\u201d The same respondent also acknowledged that there are contributing companies that understand and act in accordance with fundamental values of the Open Source movement and that contributors accept this: \u201cThey easily give away their work to companies like Suse, Redhat, Canonical, that contribute to the project, that are transparent in the way that they behave in the project.\u201d Further, one respondent pointed out that, apart from upsetting the community, switching from a copyleft to a permissive license would require a time consuming IP-clearance process. This process would require rewriting of code just because of the license and potentially stall the actual development of new features in a project.\n\nIn essence, interviewees involved in the process of establishing the LO project seem to have considered the establishment of the LO project with its independent foundation (TDF) and use of a weak copyleft license as an inevitable action to take given the perceived dissatisfaction amongst community members in the OO project.\n\n5.2. Perception of LibreOffice\n\nImmediate reaction was requested as we were seeking what respondents associated with LO rather than probing for a description or definition of it. On some occasions this caused respondents to hesitate before replying. Perhaps not surprisingly, some contributors with extensive experience from the project were hesitant in responding to this question, and one even commented: \u201cIt\u2019s a hard question, because its not a factual question. I cannot use my mind.\u201d\n\nOverall, contributors gave a variety of ideological and emotional responses, such as: \u201cfreedom\u201d, \u201csomething I believe in\u201d, \u201cIt\u2019s my project\u201d, \u201ca group of friends\u201d. As put by one contributor: \u201c[LibreOffice] is a project I have contributed to shape, and so there is also a lot of emotional participation\u201d. Similarly, other respondents expressed \u201cIt has a very deep meaning for me, I guess, having done a lot of work there\u201d, and \u201cIt\u2019s my project, the project that I am working on, so, yeah.\u201d\n\nThe concept also triggered a number of expressions of excitement, as illustrated in the following comments \u201cExciting project that is fun to hack on\u201d, \u201cIt\u2019s very positive to hear people talk about LibreOffice\u201d, and \u201cIt\u2019s cool, it\u2019s home, it\u2019s something exciting\u201d. Further, respondents also associated the concept with personal commitment. For example, as expressed by one interviewee: \u201cIt\u2019s a group of friends and people who we work with, I would say.\u201d\n\nIn addition, the concept also gave rise to a number of more rational associations. Some of those expressions relate to quality for the software system, such as: \u201cThe best office suite in the world\u201d and \u201c[LibreOffice is an] interesting, exciting project with a huge amount of work, but very good, how do people work, how we work and what we manage to do\u201d. Yet, others relate to the development model used in the project: \u201cCommunity developed office suite\u201d, whereas others were related to the developed system: \u201cOpen Source office package\u201d. Finally, some respondents seemed to have been flattered when probed for their association concerning the concept LO, when responding jokingly \u201cI recognise the name\u201d.\n\n5.3. Participation in the LibreOffice project\n\nThe extent to which contributions from participants in the LO project are related to their professional activities vary amongst respondents. We note that contributions stem from both volunteer and paid-for activities, and responses revealed that contributors are employed by several different organisations, including self-employed specialists.\n\nSeveral respondents expressed that working on the LO project is part of their professional activities, as illustrated by the following responses: \u201cI am working for LibreOffice in my professional activities\u201d, \u201cI am paid for working on LibreOffice\u201d, and \u201cIt\u2019s my full time job\u201d. Further, some respondents also expressed that their incentives for participation in the project were motivated by a technical need from their professional activities, as illustrated by one respondent: \u201cI wanted to use it to replace Microsoft Access, at what is now my day job\u201d.\n\nFor several contributors there is significant congruence between their professional activities and their contributions to the project as volunteers. For example, one of the respondents expressed that \u201cthere is a huge overlap of my professional activities\u201d, and another that \u201cit is my professional activity \u2026 it\u2019s not all of my job, I have other parts to my job I have to do \u2026 I do stuff in my free time as well\u201d.\n\nThere were also those expressing that working on the LO project is in symbiosis with a professional job even though not directly part of it: \u201cit is not related, but it is in harmony basically\u201d. Yet others expressed that their incentives for participations were motivated by business opportunities: \u201cI have a small company in [country X], and doing all kind of services, support for LibreOffice and the old OpenOffice, so that makes it logic to contribute in the project too, for me it\u2019s a logic combination\u201d.\n\nFurther, there are also contributors participating in the project primarily by volunteer activities. In the words of one respondent: \u201cwe do use LibreOffice in the company I work, but mostly the activities I do for LibreOffice is mostly as my hobby\u201d.\n\nAmongst respondents we also identified those for which professional and volunteer activities seem to merge: \u201cFor me it\u2019s like a hobby that turned into some occupation, and it\u2019s very hard to draw a line between what I am doing privately and what I am doing as an employee, and it mostly matches the interest from both company and what I would personally do\u201d.\n\n5.4. Motivations for contributing to the LibreOffice project\n\nSeveral interviewees found it difficult to single out specific issues that motivate them to contribute to the project. For example, as put by one contributor: \u201cThat\u2019s a very hard question, isn\u2019t it \u2026 I think everyone is just this mixed bundle, all sorts of motivations\u201d. Another respondent expressed that: \u201cThere are so many answers to that. It\u2019s kind of hard\u201d.\n\nRespondents expressed a number of different types of motivations for contributing to the LO project. Several comments are of emotional nature, such as: \u201cbecause it is fun and very rewarding\u201d, \u201cit\u2019s fun to contribute and while you contribute the project gets ahead so it\u2019s even more fun\u201d, and \u201cI want to do something that seems to be useful for people and significant. I think it\u2019s the joy of relationship, and just working with other people and seeing good things happen\u201d. Further, some emotional comments emphasised motivations for contributing to the project in the future: \u201cin the future if it stays fun and the community stays a nice place to be in and, yeah, it\u2019s \u2026 you can continue\u201d.\n\nClosely related to emotions, respondents also amplified social rewards and social recognition as enablers for their motivation to contribute. For example, respondents expressed: \u201cCleaning up ugly\nthings is socially rewarding\u201d and also that \u201cpositive feedback is what drives me\u201d.\n\nSimilarly, there are also ideological motivations expressed amongst respondents: \u201cI believe in free software because I think that is the proper alternative to proprietary software\u201d and \u201cI care about software freedom\u201d.\n\nThere are also intellectual motivations that seem to drive contributors. For example, one respondent motivates participation in the LO project with an argument that having a good office package is \u201cone of the biggest tasks that doesn\u2019t have already a good solution for it in Open Source\u201d. Similarly, another respondent considered establishment of a high quality LO project as \u201ca professional challenge. Because not having any money, you have to be smarter than your competitors\u201d.\n\nFor some respondents with a long-term commitment to the LO project, their participation has led to a desire to see the project succeed. As stated by one respondent: \u201cI\u2019ve invested plenty of time in this branch of software, so I really really have a personal desire to see it succeed\u201d. Others expressed a motivation for improving the way-of-working in the LO project as follows: \u201cIt may not be readily visible but we still need to add more structure and more processes, and I think I want to continue to do that\u201d.\n\nVisionary goal driven motivations for the future for the LO project were also expressed as follows: \u201cit\u2019s fun. I am convinced it\u2019s the right thing to do. I think it\u2019s the right project, at the right time, with the right people, and the right mind sets\u201d. Similarly, in the words of another respondent: \u201cI think we can change a lot with this project by running it differently and pushing borders and thinking outside the box there\u201d. Further, motivations also seem to stem from frustration concerning a perceived lack of influence in the old OO project, as commented by one respondent: \u201cI was active in OpenOffice.org project in the past, and there were lots of things that I loved of that product, but a lot of things that made me feel frustrated about influence, about things that were not picked up, and on the development side. And, so I am really motivated to work on LibreOffice, to make it better, to see it improve compared to the old OpenOffice, and that is a strong motivation\u201d.\n\nFinally, amongst respondents we observe strong commitment with the project. As expressed by one respondent: \u201cit\u2019s fun, it\u2019s something that I like to do, and it\u2019s not the first free software project that I contribute to. It\u2019s something that I have been doing for good chunks of my life now\u201d. Similarly, for some such strong commitment and motivation for participation is also related to stark emotions: \u201cIt\u2019s purely the love\u201d.\n\n5.5. Future outlook for the LibreOffice project\n\nAn overwhelming impression from responses is that contributors perceive a positive future for the LO project. Several respondents gave a number of emotional expressions, and we observe an expectation for a more diverse developer community amongst respondents in the future. For example, as stated by one respondent: \u201cI believe that we will stay diversified and that we will be able to embrace more and more, not only individuals, but companies as well\u201d.\n\nRespondents raised the budget for the project as an issue and stressed that there is a \u201cneed to strengthen the project\u201d. Other comments concern the way of working and how to organise work in the project, as illustrated by one respondent: \u201cwe still need to consolidate the organisation. We still need to increase the number of members\u201d.\n\nSeveral respondents envisaged a bright future for the LO project, as illustrated by the following comments: \u201cHereto, it has all the attributes of a very successful project. So, I believe that we will execute on the plan on releases as we have until recently, until now, because we have the time based schedule, that we always deliver on time\u201d, \u201cWhatever happens, it will continue in some way or another, some shape or another. I think that code base, it\u2019s just too many users for it to disappear. It\u2019s there to stay\u201d, and \u201cI think it is a bright future, and it grows, but it takes time\u201d. Further, one respondent also expressed a view on the project in relation to an existing proprietary alternative as follows: \u201cWe are going to grow. We are going to take over the market, and we will have a follower called Microsoft behind us\u201d.\n\nHowever, there were also those predicting a somewhat more modest future for the LO project. For example, in the view of one respondent \u201cwe keep running as we are, I think\u201d.\n\nFurther, a number of comments also revealed that the evolution for the LO project seemed to have exceeded the expectations of the respondents, as illustrated by the following comments: \u201cwhile this is a young project, I am surprised by how diverse it is and how healthy it is\u201d, \u201cI think we\u2019re doing very well, and we had a major breakthrough, milestone, when we finally got these German authorities to prove our idea of a foundation. And we\u2019re past this quite important milestone. Yeah, I am very positive about the future\u201d, and \u201cI think we are not yet aware of what is possible with it, and I am beginning to realise how much bigger this thing can get\u201d.\n\nThe importance of community and its role for the LO project was amplified by a number of respondents as an enabler for its future success. Several comments signalled a strong identity for members of the LO community as illustrated by one respondent who stressed the importance of community values as follows: \u201cThis is a community, not a company, so we don\u2019t have titles\u201d, and commented that there is consequently no need for business cards when working within the community. However, the same respondent suggested that there actually is a need for a business card in certain situations, such as when community members need to communicate with external organisations. Further, the importance of a vibrant community was also stressed by one respondent as follows: \u201cthe more rich and diverse and compelling we make our ecosystem, the stronger it is\u201d.\n\nSimilarly, another comment stressed the importance of successful governance for a community as follows: \u201cgovernance is key, if there is no governance at the end there is no project. So, some discipline is necessary, but the discipline can not go to the level of making the others scared to come inside. At the moment they are still a little bit scared. We are trying to make them less scared\u201d.\n\n5.6. Lessons learnt from participation in the LibreOffice project\n\nFrom responses it is evident that contributors perceive participation in the LO project as positive and rewarding in a number of different ways. We observed a variety of different lessons learnt from participants in the project, and a number of comments touched upon excitement, opportunities with open collaboration, and a positive inclusive atmosphere that seems to promote learning.\n\nSeveral respondents elaborated on their experiences from participation in the LO community, and attached a number of positive characteristics to the community. For example, as commented by one respondent with a long experience from participation in the community: \u201cit\u2019s a true fun, diverse, vibrant, community project. . . . I started in OpenOffice, and it was fun in the beginning, but over the year you were able to see behind, and I didn\u2019t like what I saw\u201d. Similarly, another respondent stressed the possibility to have an impact by providing value for individuals, organisations and society more broadly: \u201cthe thrilling thing about LibreOffice is that it really makes a difference. You can see people using it and appreciating it\u201d.\n\nFurther, another respondent stressed the opportunity with open collaboration as an important lesson from participation in the\nproject, as illustrated by the following comment: \u201cI think it really shows that cooperation in an open way is profitable, makes sense, and I think that is a very valuable lesson\u201d. Similarly, another respondent perceived benefits of open collaboration as follows: \u201cIt\u2019s things like this [name of a practitioner conference], meeting with people and collaborating with different people with different mentalities, and tolerating each others and each other\u2019s ideas; and yeah, even with completely different approaches to the project and expectations, to get something big out of it, yeah\u201d. Another respondent stressed the inherent nature of sharing experiences when collaborating in a community, involving providing and gaining valuable lessons, as follows: \u201cI think that I\u2019ve, at the end I have really got as much as I have given, because in term of human experiences, just incredible\u201d.\n\nSeveral respondents stressed the importance of the welcoming environment in the LO project, with particular emphasis on skills development. For example, as expressed by one respondent \u201cI think it\u2019s good for my writing skills and coding skills\u201d.\n\nSimilarly, several respondents stressed the welcoming nature and an established practice for mentoring new contributors as something highly appreciated, as illustrated by the following comment: \u201cI am pleased that we have much more welcoming environment for new developers to participate with us, and I am very pleased that a lot of these people have now very quickly become senior advisers in their own right. And that they, themselves, can feel free to mentor other people and bootstrap other new developers up to the same situation. To repeat the process on others that was done to them to make them valuable and respected developers with commit access.\u201d Further, as indicated by another respondent, the mentoring process seems to be founded in each individual\u2019s ability and with a careful consideration in the LO project to acknowledging and appreciating contributions from all contributors: \u201cI think it\u2019s the exceptionally welcoming nature of the LibreOffice community and the speed at which I was recognised for my contributions and my skills and my abilities. It\u2019s not like that in every project, you know. . . . With LibreOffice it happens very fast\u201d.\n\nFinally, another lesson learnt expressed by one respondent clearly stressed the perception of feeling rewarded from contributing to the LO project: \u201cThe most important experience was the weeks before we actually switched the upstream, and all the preparation, and then going out public and seeing how in matter of few minutes the IRC channels that we created filled with people who started to download, use and actually build LibreOffice, and those tireless moments we spent on the IRC trying to fix the possible breakages they had and it was just a magic moment to see that the things were actually moving ahead. It\u2019s emotional\u201d.\n\n6. Analysis\n\n6.1. Analysis of community evolution over time\n\nFrom our results we make a number of observations related to our results on project activity. Firstly, there have been regular and frequent releases of stable versions of the software (LO including the former development in OO) for a time period of more than ten years. Other examples of well known OSS projects with release histories extending over many years are Apache web server8 and the Linux kernel9, which have had frequent releases since 1995 and 1991, respectively. We note that, as for LO (and AOO), both these projects are governed by a foundation10 (i.e. third phase governance according to the categorisation proposed by de Laat (2007)). Secondly, there has been substantial activity in LO (including the former development in OO) for more than ten years. Despite some variation between stable releases, our findings suggest a long-term trend towards a sustainable community as we have not observed any signs of a lasting decline in the community activity. As a comparison, there has been stable community activity over many years in the aforementioned Apache web server and Linux kernel projects.\n\nBased on results concerning commitment with the projects we find that a large proportion of the most influential committers in LO have been involved for long periods of time both before and after the fork from OO, which indicates that the developer community has a strong commitment with the LO branch. A strong commitment of contributors over long time periods has been observed earlier in a study on the Debian project where it was observed that maintainers \u201ctend to commit to the project for long periods of time\u201d and that \u201cthe mean life of volunteers in the project is probably larger than in many software companies, which would have a clear impact on the maintenance of the software\u201d (Michlmayr et al., 2007). Further, our results show that a relatively small proportion (5%) of the most active LO committers contribute the majority of commits (78%) and that the five most active committers contribute 33% of all commits in the LO project. In comparison, a relatively small proportion (5%) of the most active AOO committers contribute a smaller proportion of commits (33%) and that the five most active committers contribute 60% of all commits in the AOO project. In acknowledging that our analysis of the AOO project is based on a significantly shorter time window than the LO project, we note that both projects have communities of committers larger than \u201cthe vast majority of mature OSS programs\u201d (Krishnamurthy, 2002). Results concerning commitment with each project support findings from previous research which show that for OSS projects \u201cthe bulk activity, especially for new features, is quite highly centralised\u201d (Crowston et al., 2012).\n\nResults on retention of committers show that SOO and LO have been more successful in recruiting and retaining committers over time compared to OOO and AOO. Results also show that there is no sign of any long term decline in LO in terms of number of currently active committers. Further, results concerning contributions to both the LO and AOO projects show that few new developers (i.e. those who have not contributed to the OO project) provide limited contributions to the LO project (representing 0.3% of all LO commits) and a significant amount (32.1%) of the AOO commits. When considering long-term contributors (i.e. those who have contributed to all three projects) there are still limited contributions except for AOO (representing 1.9% of all commits in OO, 6.1% of all commits in LO and 41.3% of all commits in AOO). Further, the two most dominating commitment patterns for committers who have contributed to the LO project are that committers only commit to LO and that committers have done all their contributions in OO before starting to contribute to LO (together involving 94.7% of all LO committers and 89.7% of all LO commits). In comparison, the two most dominating commitment patterns for committers who have contributed to the AOO project are that committers have contributed to LO within the period during which they have contributed to AOO (together comprising 59.9% of all AOO committers and 55.6% of all AOO commits). Moreover, a clear majority (85.7%) of the LO committers have been directly recruited to LO, whereas less than half (39.5%) of the AOO committers have been directly recruited to AOO. It is not uncommon that\n\n8 http://httpd.apache.org/.\n9 http://www.kernel.org/.\n10 The Apache Software Foundation (http://www.apache.org/) and the Linux Foundation (http://www.linuxfoundation.org/).\ndevelopers are simultaneously involved in more than one project (Lundell et al., 2010). However, our results show that a limited number of contributors are simultaneously active in the LO and AOO projects.\n\n6.2. Analysis of insights and experiences from the LibreOffice community\n\nResults from the study indicate a systematic approach in the LO project for mentoring new contributors. The project has adopted a systematic approach and supportive work practices for providing guidance to new contributors. As an example, this is done via mentoring and provision of \u201cLibreOffice Easy Hacks\u201d that are specifically aimed at inexperienced contributors. Efforts made in this project seem to go beyond what is established practice in many other OSS projects. For any project it is important to promote organisational learning and ease introduction of new contributors to a project and its work practices. This has been recognised in previous research (Lundell et al., 2010). Further, our results also show that LO project participants seem keen to encourage and acknowledge contributions from new participants in the community.\n\nOur results clearly show that use of a weak copyleft license is seen as appropriate for the LO project for a number of reasons. One reason is a perceived risk that the source code will not continue to be provided according to core principles of software freedom. This choice of Open Source license for the project has been referred to as adhering to a \u201ckeep-open\u201d license (Engelfriet, 2010). In acknowledging that there are a number of factors affecting the attractiveness for a project, it seems evident that the choice of a \u201ckeep-open\u201d license is considered appropriate amongst new contributors as the project has managed to attract a significant number of new contributors. Further, an additional indication of the preference for a \u201ckeep-open\u201d license amongst those contributors to the LO project that were also contributing to the OO project stem from results in our interviews. This in turn reinforces the observation (see above) that the majority of the contributors to the OO project that decided to continue contributing to one of the projects (AOO or LO) have chosen the LO branch.\n\nAn effect of the fork was that a part of the OO community has evolved into a new form as founding members of the LO community stem from the OO community. Over time, the new LO project has managed to attract a significant number of new contributors now managed and governed by TDF. This is in contrast with the approach taken by the AOO project, which adopted an already established structure for governance and work practices (ASF).\n\nThere is a complex inter-relationship between community and company values which impacts on opportunities for long-term maintenance and support for OSS projects. A number of respondents express that besides their involvement in the LO community they are also affiliated with various commercial organisations. For some respondents there is also a symbiosis between their different involvements. Further, our results from respondents strongly support several of the motivational factors for individual participation in OSS projects that have been identified in earlier research (Bonaccorsi and Rossi, 2006). In particular, social motivations such as that it is fun to contribute and the sense of belonging to a community are important to LO contributors. Another social motivation observed in the LO community is the opportunity to provide an alternative to proprietary software solutions. Further, we note that technological motivations such as learning and the opportunity of getting contributions and feedback from the community are also present amongst LO contributors. Some respondents, who are also active in small companies, see business opportunities in participating in the LO community. Hence, our study confirms earlier studies concerning individual motivations for participation in OSS projects.\n\n6.3. Implications\n\nThe study has revealed a number of insights concerning governance and community evolution. For long-term contributors active under several governance regimes during more than 10 years there have been several changes concerning the way of working in the different communities.\n\nContributors starting under the OO project (under governance by Sun followed by Oracle), and later active in the AOO project have experienced different corporate governance regimes followed by adoption of the Apache way of working. This transition of the project into governance under the existing ASF has involved a significant change for participants in terms of changed governance and changing conditions for contributors due to adoption of institutionalised practices and with a change from a weak copyleft license to a permissive license.\n\nOn the other hand, contributors starting under the OO project, and later active in the LO project have also experienced different corporate governance regimes (Sun and Oracle) followed by adoption of a new way of working implied by establishment of a tailor made foundation (TDF) as a legal framework for maintenance of the LO project. For these contributors, there has been continued use of a weak copyleft license. In this way, our results show that contributors shaped TDF with a view to support their preferred way of working in the LO project.\n\nIt should be noted that the choice of the same weak copyleft license as for the base project when establishing the LO project was possible without a prior IPR clearance. Further, this was possible despite the fact that the copyright for the code base in the base project was controlled by a different organisation (Oracle corporation). These circumstances allowed for that the LO project was able to immediately continue development on the same code base. However, when establishing the AOO project there was a need for IPR clearance in connection with transferring copyright of the code base to ASF and change to a new Open Source license. This transfer to ASF involved significant efforts and resulted in a significant time window between AOO project start and the first release of AOO.\n\nFrom the analysis of the three specific projects investigated (LO, OO, and AOO), it is shown that significant development experiences \u2013 both in terms of contributors and their contributions \u2013 has been maintained and transferred from the OO project into the two independent projects (LO and AOO).\n\nThe importance of establishing a strong sense for the OSS community in the context of large global OSS projects is closely related to the importance of establishing a sense of teamness in global software development projects (Lings et al., 2007). In both Open Source and proprietary licensed software projects there is a need for managing collaboration involving developers with different socio-cultural backgrounds. However, a key difference between Open Source based collaboration in large community based projects and large inter-organisational collaborations using proprietary software in global contexts lies in the possibility to successfully fork an OSS project and establish a new project with a separate governance. The importance of face to face meetings is recognised both in the contexts of inter-organisational collaboration in the field of global software engineering (Lings et al., 2007) and large globally distributed OSS projects analysed in this study. Further, from our analysis in this study we note that the importance of establishing a common vision for an OSS community relates to experiences in the context of global software engineering concerning the importance of gaining \u201cexecutive support from all the sites\u201d in a globally distributed software development project (Paasivaara, 2011).\n7. Discussion and conclusions\n\n7.1. Discussion\n\nThe transition and formation of the LibreOffice community seems to be successful. However, we acknowledge the short time period after the fork (33 months) and that our early indications of a successful LibreOffice community after transition from OpenOffice.org need to be confirmed by an analysis over a longer time period at a later stage. As a comparison, a well-known fork with significant uptake and a long-term sustainable community is OpenBSD,\\(^\\text{11}\\) which was forked from NetBSD in 1995 and still has an active developer community (Gmane, 2013).\n\nFurther, when considering Open Source software products in long-term maintenance scenarios for potential adoption, it is critical to understand and engage in communities related to the Open Source software project. For the base project analysed (OpenOffice.org), a governance structure has been established and the OpenOffice.org community was governed by its community council (Openoffice, 2013). Similarly, the investigated branch after the fork (LibreOffice) has also established a governance structure referred to as the Document Foundation (Documentfoundation, 2013a). Despite such explicitly documented governance structures, project participants may decide to fork a project, which happened when the Document Foundation established the LibreOffice project as a fork from OpenOffice.org on 28 September 2010. Our results suggest that this fork may actually be successful. We note that our observation indicates that the LibreOffice project may be an exception to the norm since previous research claims that there have been \u201cfew successful forks in the past\u201d (Ven and Mannaert, 2008).\n\nFrom our results, it remains to be seen to what extent the LibreOffice and Apache OpenOffice projects may successfully evolve their projects and associated communities in a way that can be sustainable long-term. So far it seems that LibreOffice has been the more successful project in terms of growing associated communities. Our results suggest that the choice of Open Source license significantly impacts on conditions for attracting contributions to Open Source projects. Amongst contributors to the LibreOffice project there is clear preference for contributing to an Open Source project which use the same weak copyleft license as the base project. This use of a keep-open license in the LibreOffice project may significantly impact on the willingness to contribute to an Open Source project for which they do not possess the copyright. This may be so both amongst volunteer and company affiliated developers. Our results show strong indications of congruence between professional roles and contributions to the LibreOffice community for community members.\n\nWe acknowledge that the LibreOffice project has been established and openly available for external contributions for a longer time period than the Apache OpenOffice project. This can partly be explained by a later start for the Apache OpenOffice project since there has been a state of void between 15 April 2011 when Oracle abandoned OpenOffice.org and 13 June 2011 when Apache OpenOffice was established as an Apache Software Foundation project. Further, we note that the first commits in the Apache OpenOffice repository were contributed in August 2011. Therefore, it is perhaps not surprising that a number of contributors from the OpenOffice.org project became involved in the LibreOffice project, since there was no active OpenOffice.org project to contribute to for several months. However, it should be noted that after August 2011, when the first commits were contributed and Apache OpenOffice became openly available, committers have continued to contribute to the LibreOffice project.\n\nThe situation analysed in the paper has an inherent complexity in that it involves three projects for which there are complex interactions, influences, and relationships both with respect to code and community dynamics. Therefore this study challenges previously established categorisations of fork outcomes and also how the concept of fork is defined. This is since the foundation for such categorisations and definitions often consider the relationship between two projects, often referred to as the base and the forked project (Robles and Gonzalez-Barahona, 2012; Wheeler, 2007). Further, this study has shown that individual contributors in related OSS developer communities can contribute to several projects over a period of time, including the base and the forked project.\n\nThe analysis of sustainability of Open Source software communities and evolution of two independent Open Source software projects after a fork shows there is potential for successful branching. Our specific emphasis has been to investigate insights and experiences from community members for the project which was established as an outcome of a fork. From this we find that long-term community members seem to manage establishing a new project and a tailor-made foundation for its governance in a way that is appealing to old and new contributors.\n\nIn situations such as the one analysed in this study there is no one-to-one correspondence between Open Source software project and Open Source software community. Consequently, when assessing the sustainability of such communities it is important to recognise that individual contributors are involved in multiple projects. Therefore, any such assessment must take into account that community involvement goes beyond any single project.\n\nIrrespective of how relationships between the projects are perceived with transition from the base project to the two new projects, our results from the analysis of the three inter-related projects with associated transitions from the OpenOffice.org project go beyond previously established categorisations of fork outcomes. Our results thereby provide valuable insights for extending the existing body of knowledge concerning forks.\n\n7.2. Conclusions\n\nOur study presents findings from the first comprehensive analysis of Open Source software projects involving a fork. The study reveals a number of important findings related to long-term sustainability of Open Source software communities.\n\nRelated to the characterisation of community evolution over time for the three inter-related Open Source projects, the study presents several important findings: First, the LibreOffice project shows no sign of long-term decline, and that as such details circumstances under which a fork can be successful. Second, the majority of contributors to the OpenOffice.org project who continued in one of the succeeding projects chose to continue contributing to the LibreOffice project. Further, LibreOffice has attracted the long-term and most active committers in the OpenOffice.org project, and it is thereby demonstrated that successful transfer and evolution of know-how and work practices can be achieved beyond individual Open Source software projects. Third, OpenOffice.org (under governance of Sun) and LibreOffice have been more successful in recruiting and retaining committers over time compared to OpenOffice.org (under governance of Oracle) and Apache OpenOffice. This suggests that effective governance and work practices that are appreciated by community members is fundamental for long-term sustainability. Fourth, a minority of the LibreOffice committers have been recruited from OpenOffice.org and have contributed a clear majority of the LibreOffice commits. On the other hand, the vast majority of LibreOffice committers have been directly recruited to the project but their commits to the project are in minority. From this we conclude that apart from community efforts for making it easier to contribute to an Open Source software\n\n\\(^{11}\\) http://www.openbsd.org/.\nproject it also important to address challenges related to long-term retention of contributors.\n\nThe study makes a novel contribution by revealing important insights and experiences from members of the LibreOffice community, and provides explanations for why the LibreOffice project has evolved as it has. There is clear preference for use of a copyleft license amongst contributors to the LibreOffice project, both amongst volunteers and those affiliated with companies. The use of such a license in the LibreOffice project is perceived as a prerequisite for entry amongst many volunteer contributors and those affiliated with companies. This suggests that such an Open Source license is preferred amongst contributors in Open Source software projects with a strong community identity. Further, the study shows that it is important that values amongst contributors and other stakeholders are congruent with effects of the particular Open Source license used. Results from the study elaborate tension in a community and details circumstances under which community members need to vary in order to avoid an ineffective collaboration climate in an Open Source software project. Further, the study reveals important motivations for joining and contributing to the LibreOffice project over time, including: a perceived welcoming atmosphere in the community; a sense of supportive and effective work practices; appreciation for independence and control of developed solutions by members of the community; and a strong identity and appraisal of community diversity. Thereby the study has detailed the importance of nurturing Open Source software communities in order to establish long-term sustainable Open Source software projects. From a contributor perspective, the study shows that Open Source software communities can outlive Open Source software projects. In particular, for projects with associated devoted communities with strong conviction for future directions for projects and communities, we find strong indications for that forking can be used as one effective strategy for overcoming perceived obstacles in the current way of working in a project in order to improve the situation.\n\nThe findings from our analysis of the LibreOffice project (and the related OpenOffice.org and Apache OpenOffice projects) contribute new insights concerning challenges related to long-term sustainability of Open Source software communities. For software systems with long life-cycles, the success by which an Open Source software project manages to recruit and retain new contributors to its community is critical for its long term sustainability. Hence, good practice with respect to governance of Open Source software projects is perceived by community members as a fundamental challenge for establishing sustainable communities.\n\nReferences\n\n\u00c5gerfalk, P., Fitzgerald, B., 2008. Outsourcing to an unknown workforce: exploring open sourcing as a global sourcing strategy. MIS Quarterly 32 (2), 385\u2013410.\n\nApache, 1999. The Apache Software Foundation Board of Directors Meeting Minutes, http://www.apache.org/foundation/records/minutes/1999/board_minutes_1999_06_01.txt (accessed June 2013).\n\nApache, 2013a. Apache OpenOffice, http://openoffice.apache.org/ (accessed June 2013).\n\nApache, 2013b. The Apache Software Foundation \u2013 Foundation Project, http://www.apache.org/foundation/ (accessed June 2013).\n\nBach, P., Carroll, J., 2010. Characterizing the dynamics of open user experience design: the cases of firebox and OpenOffice.org. JAIS 11 (special issue), 902\u2013925.\n\nBacon, J., 2009. The Art of Community. O\u2019Reilly Media, Sebastopol.\n\nBlondelle, G., Arberet, P., Rossignol, A., Lundell, B., Labeze, P., Berrendonner, R., Gauffret, P., Faudot, R., Langlois, B., Maioncello, L., Moro, P., Rodriguez, J., Puerta Pe\u00f1a, J.M., Bonafous, E., Mueller, R., 2012a. Polarsys towards long-term availability of engineering tools for embedded systems. In: Proceedings of the Sixth European Conference on Embedded Real Time Software and Systems (ERTS 2012), Toulouse, France, 1\u20132 February.\n\nBlondelle, G., Langlois, B., Gauffret, P., 2012b. How Polarsys addresses Long Term Support and develops the ecosystem of Eclipse tools for Critical Embedded Systems. EclipseCon on US 2012, Reston, Virginia, 26\u201328 March, http://www.eclipsecon.org/2012/sessions/how-polarsys-addresses-long-term-support-and-develops-ecosystem-eclipse-tools-critical-embe\n\nBonaccorsi, A., Rossi, C., 2006. Comparing motivations of individual programmers and teams to take part in the open source movement: from community to business. Knowledge, Technology & Policy 18 (4), 60\u201364.\n\nBrock, A., 2013. Understanding commercial agreements with open source projects. In: Coughlan, S. (Ed.), Thoughts on Open Innovation \u2013 Essays on Open Innovation from Leading Thinkers in the Field, OpenForum Europe Ltd for OpenForum Academy, Brussels.\n\nByfield, B., 2010. The Cold War Between OpenOffice.org and LibreOffice. Linux Magazine, http://www.linux-magazine.com/Online/Blogs/Off-the-Beat-Bruce-Byfield-s-Blog/The-Cold-War-Between-OpenOffice.org-and-LibreOffice (accessed June 2013).\n\nConlon, M.P., 2007. An examination of initiation, organization, participation, leadership, and control of success in Open Source software development projects. Information Systems Education Journal 5 (38), 1\u201313.\n\nCrn, 1999. Sun Microsystems Buys Star Division, http://www.crn.com/news/channel-programs/18804525/sun-microsystems-buys-star-division.htm (accessed June 2013).\n\nCrowston, K., Annabi, H., Howison, J., 2003. Defining Open Source Software project success. In: Proceedings of the International Conference on Information Systems (ICIS 2003), Seattle, WA, USA, 14\u201317 December. pp. 327\u2013340.\n\nCrowston, K., Howison, J., Annabi, H., 2006. Information systems success in free and Open Source software development: theory and measures. Software Process: Improvement and Practice 11 (2), 123\u2013148.\n\nCrowston, K., Kangning, W., Howison, J., Wiggins, A., 2012. Free/Libre open-source software development: what we know and what we do not know. ACM Computing Surveys 44 (2) (article 7).\n\nde Laat, P., 2007. Governance of open source software: state of the art. Journal of Management Information and Governance 11 (2), 165\u2013177.\n\nDeshpande, A., Riehle, D., 2008. The total growth of Open Source. In: Russo, B., et al. (Eds.), Open Source Development, Communities and Quality. IFIP Advances in Information and Communication Technology, vol. 275. Springer, New York, pp. 197\u2013209.\n\nDinh-Trong, T.T., Bieman, J.M., 2005. The FreeBSD project: a replication case study of open source development. IEEE Transaction on Software Engineering 31 (6), 481\u2013494.\n\nDocumentfoundation, 2013a. The Document Foundation, http://www.documentfoundation.org/ (accessed June 2013).\n\nDocumentfoundation, 2013b. The Document Foundation Manifesto, http://www.documentfoundation.org/pdf/manifesto.pdf (accessed June 2013).\n\nDocumentfoundation, 2013c. The Document Foundation \u2013 Our Supporters, http://www.documentfoundation.org/supporters/ (accessed June 2013).\n\nEngellfriet, A., 2010. Choosing an Open Source license. IEEE Software 27 (1), 48\u201349.\n\nFetelson, D.G., 2012. Perpetual development: a model of the Linux kernel life cycle. Journal of Systems and Software 85 (4), 859\u2013875.\n\nGamalielsson, J., Lundell, B., 2011. Open Source communities for long-term maintenance of digital assets: what is offered for ODF & OOXML? In: Hammouda, L., Lundell, B. (Eds.), Proceedings of SOS 2011: Towards Sustainable Open Source. Tampere University of Technology, Tampere, pp. 19\u201324, ISBN 978-952-15-2411-0, ISSN 1737-836X.\n\nGamalielsson, J., Lundell, B., 2012. Long-term sustainability of Open Source software communities beyond a fork: a case study of LibreOffice. In: Hammouda, L., et al. (Eds.), Open Source Systems: Long-Term Sustainability. IFIP Advances in Information and Communication Technology, vol. 378. Springer, Heidelberg, pp. 29\u201347.\n\nGamalielsson, J., Lundell, B., Lings, B., 2010. The Nagios community: an extended quantitative analysis. In: Agerfalk, P., et al. (Eds.), Open Source Software: New Horizons. IFIP Advances in Information and Communication Technology, vol. 319. Springer, Berlin, pp. 85\u201396.\n\nGamalielsson, J., Lundell, B., Mattsson, A., 2011. Open Source software for model driven development: a case study. In: Hissam, S. (Ed.), Open Source Systems: Grounding Research. IFIP Advances in Information and Communication Technology, vol. 365. Springer, Heidelberg, pp. 348\u2013367.\n\nGerman, D., 2003. The GNOME project: a case study of open source global software development. Journal of Software Process: Improvement and Practice 8 (4), 201\u2013215.\n\nGmane, D., 2013. Information about gmane.os.openbsd.cvs, http://dir.gmane.org/gmane.os.openbsd.cvs (accessed June 2013).\n\nHuysmans, F., Ven, K., Verelst, J., 2008. Reasons for the non-adoption of OpenOffice.org in a data-intensive administration. First Monday 13 (10).\n\nIBM, 2011. IBM to Contribute to New, Proposed OpenOffice.org Project, http://www-03.ibm.com/press/us/en/pressrelease/34638.wss (accessed June 2013).\n\nIsa\u00ebla, A., Fettelson, D.G., 2010. The Linux kernel as a case study in software evolution. Journal of Systems and Software 83 (3), 485\u2013501.\n\nIzurieta, C., Bieman, J., 2006. The evolution of FreeBSD and Linux. In: Proceedings of the 5th ACM-IEEE International Symposium on Empirical Software Engineering (ISESE\u201906), September 21\u201322, Rio de Janeiro, Brazil.\n\nKoponen, T., Lintula, H., Hotti, V., 2006. Defects reports in Open Source Software maintenance process \u2013 OpenOffice.org case study. In: Proceedings of Software Engineering Applications (SEApp\u201906), Dallas, TX, USA, 13\u201315 November.\n\nKrishnamurthy, S., 2002. Cave or community? An empirical examination of 100 mature Open Source projects. First Monday 7 (6).\n\nLee, S.-Y.T., Kim, H.-W., Gupta, S., 2009. Measuring open source software success. Omega 37 (2), 426\u2013438.\n\nLings, B., Lundell, B., 2005. On the adaptation of Grounded Theory procedures: insights from the evolution of the 2G method. Information Technology & People 18 (3), 196\u2013211.\nLings, B., Lundell, B., \u00c5gerfalk, P.J., Fitzgerald, B., 2007. A reference model for suc-\ncessful distributed Development of Software Systems. In: Proceedings of the Second International Conference on Global Software Engineering (ICGSE 2007). IEEE Computer Society, pp. 130\u2013139.\n\nLinususer, 2010. OpenOffice.org Community Announces. The Document Foun-\ndation. http://www.openoffice.org/press-release/announces-the-document-foundation (accessed June 2013).\n\nLopez-Fernandez, L., Robles, G., Gonzalez-Barahona, J.M., Herraz, I., 2006. Apply-\ning social network analysis techniques to community-driven Libre software projects. International Journal of Information Technology and Web Engineering 1 (3), 27\u201348.\n\nLundell, B., 2011. e-Governance in public sector ICT-procurement: what is shaping practice in Sweden? European Journal of ePractice 12 (6), http://www.epractice.eu/files/European%20Journal%20of%20ePractice%20Volume%2012%266.pdf.\n\nLundell, B., Gamalielsson, J., 2011. Towards a Sustainable Swedish e-Government Practice: Observations from unlocking digital assets. In: Proceedings of the IFIP 11th Government Conference 2011 (EGOV 2011), Delft, The Netherlands, 28 August\u20132 September 2011.\n\nLundell, B., Lings, B., Lindqvist, E., 2010. Open Source in Swedish companies: where are we? Information Systems Journal 20 (6), 519\u2013535.\n\nLundell, B., Lings, B., Syberfeldt, A., 2011. Practitioner perceptions of Open Source software in the embedded systems area. Journal of Systems and Software 84 (9), 1540\u20131549.\n\nMadye, G., Freeh, V., Tynan, R., 2004. Modeling the F/OSS community: a quantitative investigation. In: Koch, S. (Ed.), Free/Open Source Software Development. Idea Group Publishing, Hershey, pp. 203\u2013221.\n\nMarketwire, 2011a. Oracle Announces Its Intention to Move OpenOffice.org to a Community-Based Project, http://www.marketwire.com/press-release/oracle-announces-its-intention-to-move-openofficeorg-to-a-community-based-project-nasdaq-orcl-1503027.htm (accessed June 2013).\n\nMarketwire, 2011b. Oracle to Contribute to Apache, http://www.marketwire.com/press-release/statements-on-openofficeorg-contribution-to-apache-nasdaq-orcl-1521400.htm (accessed June 2013).\n\nMeyers-Romero, J., Robles, G., Ortu\u00f1o-P\u00e9rez, M., Gonzalez-Barahona, J.M., 2008. Using social network analysis techniques to study collaboration between a FLOSS community and a company. In: Russo, B., et al. (Eds.), Open Source Development, Communities and Quality. IFIP Advances in Information and Communication Technology, vol. 275, Springer, New York, pp. 171\u2013186.\n\nMens, T., Fern\u00e1ndez-Ram\u00edrez, J., Degrandts, S., 2008. The evolution of Eclipse. In: Proceedings of the 24th IEEE International Conference on Software Maintenance (ICSM 2008), pp. 386\u2013395.\n\nMichlmayr, M., 2009. Community management in Open Source projects. The Euro-\npean Journal for the Informatics Professional X (3), 22\u201326.\n\nMichlmayr, M., Robles, G., Gonzalez-Barahona, J.M., 2007. Volunteers in large Libre software projects: a quantitative analysis. In: Sowe, S.K., et al. (Eds.), Emerging Free and Open Source Software Practices. IGI Publishing, Hershey, pp. 1\u201324.\n\nMidha, V., Palvia, P., 2012. Factors affecting the success of Open Source software. Journal of Systems and Software 85 (4), 895\u2013905.\n\nMockus, A., Fielding, R.T., Herbsleb, J.D., 2002. Two case studies of Open Source software development: Apache and Mozilla. ACM Transactions on Software Engineering and Methodology 11 (3), 309\u2013346.\n\nMoon, Y.J., Sproull, L., 2000. Essence of distributed work: the case of the Linux kernel. First Monday 5 (12), 1\u20137.\n\nM\u00fcller, R., 2008. Open Source \u2013 Value Creation and Consumption. In: Open Expo, Z\u00fcrich, 24\u201325 September.\n\nNouws, L., 2011. LibreOffice \u2013 the first year and looking forward! In: Presented at ODF Plugfest, Gouda, Netherlands, 2011-11-18, http://plugfest.\n\nNyman, L., Mikkonen, T., Lindman, J., Foug\u00e8re, M., 2012. Perspectives on code forking and sustainability in Open Source software. In: Hammouda, L., et al. (Eds.), Open Source Systems: Long-Term Sustainability. IFIP Advances in Information and Communication Technology, vol. 378, Springer, Heidelberg, pp. 274\u2013279.\n\nOpenoffice, 2002. OpenOffice.org Community Announces, OpenOffice.org 1.0. Free Office Productivity Software, http://www.openoffice.org/about_us/oooe_release.html (accessed June 2013).\n\nOpenoffice, 2004. OpenOffice.org Is Four, http://www.openoffice.org/about_us/birthday4.html (accessed June 2013).\n\nOpenoffice, 2012. The Apache OpenOffice Project Announces Apache OpenOffice\u2122 3.4, http://www.openoffice.org/news/aoo34.html (accessed June 2013).\n\nOpenoffice, 2013. Community Council, http://wiki.services.openoffice.org/wiki/Community_Council (accessed June 2013).\n\nOracle, 2010. Oracle Completes Acquisition of Sun, http://www.oracle.com/us/corporate/press/044428 (accessed June 2013).\n\nPaasivaara, M., 2011. Coaching global software development projects. In: Proceed-\nings of the 30th International Conference on Global Software Engineering (ICGSE 2011). IEEE Computer Society, pp. 84\u201393.\n\nPclomsag, 2011. Free At Last! LibreOffice 3.3 Released, http://pclomsag.com/html/Issues/201103/page14.html (accessed June 2013).\n\nRaja, U., Tretter, M.J., 2012. Defining and evaluating a measure of Open Source project survivability. IEEE Transactions on Software Engineering 38 (1), 163\u2013174.\n\nRay, B., Kim, M., 2012. A case study of cross-system porting in forked projects. In: Pro-\nceedings of the 20th ACM SIGSOFT International Symposium on the Foundations of Software Engineering, 11\u201316 November 2012, Cary, NC.\n\nRobert, S., 2006. On-board development \u2013 the open-source way. In: IST/ARTEMIS Workshop, Helsinki, 22 November.\n\nRobles, G., Gonzalez-Barahona, J.M., 2012. A comprehensive study of software forks: dates, reasons and outcomes. In: Hammouda, L., et al. (Eds.), Open Source Systems: Long-Term Sustainability. IFIP Advances in Information and Commu-\nnication Technology, vol. 378. Springer, Heidelberg, pp. 1\u201314.\n\nRobles, G., Gonzalez-Barahona, J.M., Michlmayr, M., 2005. Evolution of volunteer participation in Libre software projects: evidence from Debian. In: Proceedings of the First International Conference on Open Source Systems (OSS 2005), pp. 100\u2013107.\n\nRossi, B., Scotto, M., Sillitti, A., Succi, G., 2006. An empirical study on the migration to Open Source software in a public administration. International Journal of Information Technology and Web Engineering (IJITWE) 1 (3), 64\u201380.\n\nRossi, B., Russo, B., Succi, G., 2009. Analysis of Open Source development evolution iterations by means of burst detection techniques. In: Boldyreff, C., et al. (Eds.), Open Source Ecosystems: Diverse Communities Interacting. IFIP Advances in Information and Communication Technology, vol. 299. Springer, Berlin, pp. 83\u201393.\n\nSamoladas, I., Stamos, I., Angelos, L., 2010. Survival analysis on the duration of open source projects. Information and Software Technology 52 (9), 902\u2013922.\n\nSantos, C., Kuk, G., Kon, F., Pearson, J., 2013. The attraction of contributors in free and Open Source software projects. Journal of Strategic Information Systems 22 (1), 45\u201369.\n\nSen, R., Singh, S.S., Borle, S., 2012. Open Source software success: measures and analysis. Decision Support Systems 52 (2), 364\u2013372.\n\nSeverance, C., 2012. The Apache Software Foundation: Brian Behlendorf. Computer 45 (1), 1\u20136.\n\nSeydel, J., 2009. OpenOffice.org: when will it be ready for prime time? In: Proceed-\nings of the Southwest Decision Sciences Institute Conference (SWDSI), 25\u201328 Ed.\n\nShibuya, B., Tamai, T., 2009. Understanding the process of participating in open source communities. In: Proceedings of the 2009 ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development. IEEE Computer Society, Washington, DC, USA, pp. 1\u20134.\n\nSubramaniam, C., Sen, R., Nelson, M.L., 2009. Determinants of Open Source software project success: a longitudinal study. Decision Support Systems 46 (2), 576\u2013585.\n\nVen, K., Mannenat, H., 2008. Challenges and strategies in the use of Open Source Soft-\nware by Independent Software Vendors. Information and Software Technology 50 (9\u201310), 991\u20131002.\n\nVen, K., Huysmans, P., Verelst, J., 2007. The adoption of open source desktop software in a large public administration. In: Proceedings of the 13th Americas Conference on Information Systems (AMCIS 2007), 9\u201312 August, Keystone, CO.\n\nVen, K., Van Kerckhoven, G., Verelst, J., 2010. The adoption of open source desktop software: a qualitative study of Belgian organizations. International Journal of IT/Business Alignment and Governance (IJITBAG) 1 (4), 1\u201317.\n\nViseur, R., 2012. Forks impacts and motivations in free and open source projects. International Journal of Advanced Computer Science and Applications (IJACSA) 3 (2), 117\u2013122.\n\nWang, J., 2012. Survival factors for Free Open Source software projects: a multi-stage perspective. European Management Journal 30 (4), 352\u2013371.\n\nWheeler, D.A., 2007. Why Open Source Software/Free Software (OSS/FS, FLOSS, or OSS) is important. In: Wheeler, D.A. (Ed.), Open Source Software: New Horizons. IFIP Advances in Information and Communication Technology, vol. 319. Springer, Berlin, pp. 294\u2013307.\n\nWiggins, A., Howison, J., Crowston, K., 2009. Heartbeat: measuring active user base and potential user interest in FLOSS projects. In: Boldyreff, C., et al. (Eds.), Open Source Ecosystems: Diverse Communities Interacting. IFIP Advances in Information and Communication Technology, vol. 299. Springer, Berlin, pp. 94\u2013104.\n\nJonas Gamalielsson is a researcher at the University of Sk\u00f6vde\u2019s Informatics Research Centre. He has conducted research on open source and open standards in several projects. He has been involved in the Open Source Action (OSA) project (2008\u20132010), the Nordic (NordForsk) OSS Researchers Network (2009\u20132012), and the ITEA2-project OPEES (Open Platform for the Engineering of Embedded Systems). Further, he is participating in the ORIOS (Open Source based Reference implementations for Open Standards) project. He has also been involved in the Fifth and Eighth International Conference on Open Source Systems (OSS 2009 and OSS 2012).\n\nBj\u00f6rn Lundell is a senior researcher at the University of Sk\u00f6vde\u2019s Informatics Research Centre. He has been researching the Open Source phenomenon for several years and participated in a number of research projects in different leading roles, including: co-lead for a work package in the EU FP6 CALIBRE project (2004\u20132006), the project manager in the Swedish National Research Project OSS (2005\u20132008), and is currently the project leader for the ORIOS project (2012\u20132015). He is a founding member of IFIP WG 2.13 on Open Source Software, and was program co-chair for the Eighth International Conference on Open Source Systems (OSS 2012).", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/007-gamalielsson.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 18, "total-input-tokens": 94744, "total-output-tokens": 27848, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 4644, 1], [4644, 12815, 2], [12815, 21137, 3], [21137, 28933, 4], [28933, 35610, 5], [35610, 40559, 6], [40559, 41596, 7], [41596, 47027, 8], [47027, 49657, 9], [49657, 54000, 10], [54000, 59035, 11], [59035, 66630, 12], [66630, 74405, 13], [74405, 81976, 14], [81976, 89722, 15], [89722, 97528, 16], [97528, 107592, 17], [107592, 118346, 18]]}}
|
|
{"id": "9b7d44c772275b7151f5385189a5ff8dbd978044", "text": "The Labor of Maintaining and Scaling Free and Open-Source Software Projects\n\nR. STUART GEIGER\u2217, University of California, San Diego; Department of Communication and Halicioglu Data Science Institute, USA\nDOROTHY HOWARD, University of California, San Diego; Department of Communication and Feminist Labor Lab, USA\nLILLY IRANI, University of California, San Diego; Department of Communication, The Design Lab, and Feminist Labor Lab, USA\n\nFree and/or open-source software (or F/OSS) projects now play a major and dominant role in society, constituting critical digital infrastructure relied upon by companies, academics, non-profits, activists, and more. As F/OSS has become larger and more established, we investigate the labor of maintaining and sustaining those projects at various scales. We report findings from an interview-based study with contributors and maintainers working in a wide range of F/OSS projects. Maintainers of F/OSS projects do not just maintain software code in a more traditional software engineering understanding of the term: fixing bugs, patching security vulnerabilities, and updating dependencies. F/OSS maintainers also perform complex and often-invisible interpersonal and organizational work to keep their projects operating as active communities of users and contributors. We particularly focus on how this labor of maintaining and sustaining changes as projects and their software grow and scale across many dimensions. In understanding F/OSS to be as much about maintaining a communal project as it is maintaining software code, we discuss broadly applicable considerations for peer production communities and other socio-technical systems more broadly.\n\nCCS Concepts: \u2022 Social and professional topics \u2192 Computer supported cooperative work; Socio-technical systems; Computing profession; Project and people management; \u2022 Software and its engineering \u2192 Open source model.\n\nAdditional Key Words and Phrases: open source, free software, maintenance, infrastructure, labor\n\nACM Reference Format:\nR. Stuart Geiger, Dorothy Howard, and Lilly Irani. 2021. The Labor of Maintaining and Scaling Free and Open-Source Software Projects. Proc. ACM Hum.-Comput. Interact. 5, CSCW1, Article 175 (April 2021), 28 pages. https://doi.org/10.1145/3449249\n\n1 INTRODUCTION\n\nFree and/or open-source software (or F/OSS) refers to a broad set of working processes, social movements, and organizations that have formed around the production and distribution of software,\n\n\u2217The majority of the work on this project was conducted when Geiger was affiliated with the Berkeley Institute for Data Science at the University of California, Berkeley\n\nAuthors\u2019 addresses: R. Stuart Geiger, University of California, San Diego; Department of Communication and Halicioglu Data Science Institute, 9500 Gilman Dr, La Jolla, California, USA, 92093; Dorothy Howard, University of California, San Diego; Department of Communication and Feminist Labor Lab, 9500 Gilman Dr, La Jolla, California, USA, 92093; Lilly Irani, University of California, San Diego; Department of Communication, The Design Lab, and Feminist Labor Lab, 9500 Gilman Dr, La Jolla, California, USA, 92093.\n\nPermission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).\n\n\u00a9 2021 Copyright held by the owner/author(s).\n2573-0142/2021/4-ART175. https://doi.org/10.1145/3449249\n\nProc. ACM Hum.-Comput. Interact., Vol. 5, No. CSCW1, Article 175. Publication date: April 2021.\n\nThis work is licensed under a Creative Commons Attribution International 4.0 License.\n\u00a9 2021 Copyright held by the owner/author(s).\n2573-0142/2021/4-ART175. https://doi.org/10.1145/3449249\nwith a complex and contested history going back decades. These movements have been extensively studied from many disciplinary perspectives, as well as the subject of substantial commentary from its members, across its many factions [e.g. 49, 102, 125]. These software projects publicly release their source code, rather than various commercial models of software in which firms require payment to use software and/or restrict the ability for users to modify the software. Practitioners often describe F/OSS as being \u2018free\u2019 in two ways: free in being available at no cost (called \u201cfree as in beer\u201d), and free in having source code available and licensed such that users can modify it (called \u201cfree as in speech\u201d) [78]. However, it is important to ask about how the work of maintaining these projects fits into these paradigms of free-ness, when F/OSS and other similar peer production projects require labor and material resources [41, 84, 124].\n\nIn prior decades, many early F/OSS projects began as hobbyist efforts to build alternatives to commercial proprietary software from the tech industry. Many early contributors volunteered spare time or negotiated with their employer to let them spend work time on F/OSS [8, 27, 29, 64, 76, 94]. Many F/OSS projects have become more commercial and part of the tech sector over the past two decades [10, 47]. Today, F/OSS has grown such that many projects have become the dominant product in their sector and are extensively relied upon by commercial software firms (e.g. Linux, Apache, Python). Many of the most successful F/OSS projects are not user-facing applications, but software infrastructure that are relied upon by companies inside and outside of the software industry, such as operating systems, programming languages, software libraries, servers, and web components. A 2020 survey of 950 enterprise-sized companies across sectors reported that 95% said open source software was important to their infrastructure strategy and 77% would adopt more open source software the next year [103]. F/OSS is also relied upon by government entities, non-profits, and activist movements, where the free cost and the ability to modify it can be crucial.\n\nAs F/OSS projects have become more critically embedded into organizations and economies, there has been a major shift about questions of \u201csustainability\u201d within many projects, especially those that began as volunteers\u2019 side projects. This term is used to call attention to whether projects will keep developing and maintaining what others rely on, as all software must be maintained to continue to be useful to its users. Nadia Eghbal\u2019s influential report on the topic opens with the Heartbleed bug in OpenSSL, a F/OSS software library used by two-thirds of websites to handle encryption, leading to the worst security vulnerability in the web\u2019s history [41]. Despite the critical centrality of OpenSSL, the project\u2019s maintainers had long struggled to find the time and money to work on it. Eghbal quotes the lead maintainer\u2019s public post: \u201cThe mystery is not that a few overworked volunteers missed this bug; the mystery is why it hasn\u2019t happened more often\u201d [88]. Eghbal\u2019s recent work suggests that small numbers of individual developers often do the bulk of the work in many F/OSS projects and can have somewhat transactional relationships with contributors and users, in contrast to how predominant narratives present F/OSS as composed of large collaboration-driven communities [42].\n\nOur research question asks how the work of maintaining these projects changes as F/OSS projects become key dependencies for others, including well-resourced organizations in and out of the tech sector. We conducted 37 qualitative interviews with current or former F/OSS contributors and maintainers. Our focus was projects that began as purely-volunteer efforts and have since become widely relied upon as infrastructure for other organizations beyond the project. We find that as projects scale across all kinds of dimensions \u2014 number of users, contributors, or maintainers; kinds of users, contributors, or maintainers; size, complexity, and features of the codebase; interdependence in a software ecosystem; and more \u2014 the work of maintaining the project and the meaning of being a maintainer can dramatically change. Scale brings new tasks and changes the nature of existing tasks. For example, for projects with few or no users, providing technical support to users can be an exciting opportunity to grow the community around the project. Yet for a large-scale\nproject with millions of users, this can become an overwhelming flood of demands that requires establishing specific rules, roles, and norms, such as developing processes for triaging user support requests.\n\nIn particular, we find that the ostensibly-technical work of software engineering takes on more organizational, communicative, and even competitive aspects at larger scales. This is a well-established theme about the socio-technical nature of computer-supported cooperative work, of software engineering, and of work in general. However, our study details how the activities and experiences of this maintenance work change as projects grow, develop, and become embedded within broader networks of people, code, money, and institutions, including corporations, governments, academia, non-profits, and other F/OSS projects. We conclude by discussing the \u201cscalar labor\u201d of managing a project as it scales, such as how the deferral of this labor of scaling can create consequences for projects down the line \u2013 a problem we term \u201cscalar debt.\u201d Maintenance tasks pile up, requiring a massive amount of often less-visible work to build more organizational capacity to keep up with an onslaught of demands. Finally, we discuss how F/OSS maintainers of popular projects also sometimes face additional work as they become hypervisible and even microcelebrities, which is contrary to how infrastructural maintenance is typically described as \u201cbehind the scenes\u201d or \u201cinvisible\u201d work.\n\n2 BACKGROUND AND LITERATURE\n\n2.1 Trajectories of F/OSS research\n\nOur study took place in 2019-2020, during an era of F/OSS that is different to prior decades, when foundational works about F/OSS proliferated. These classic accounts (e.g. [24]) suggested that F/OSS is composed of ideologically-driven collaborative and voluntary communities, producing public goods intended to supplant proprietary software alternatives. Past work documented F/OSS\u2019s connections to early internet engineers and makers [16], universities and academic research, and an opposition to corporate-friendly copyright and patent law [25, 76]. Academics and practitioners have discussed F/OSS as a social movement, which has splintered, with \u201copen source\u201d rising as a competing movement to free software, one that transformed the original anti-commercial values of free software. [77].\n\nResearchers have focused on how F/OSS contributors collaborate and organize. F/OSS projects that have an \u201copen development model\u201d [53] are studied with other \u201cpeer production\u201d [6] communities like Wikipedia or citizen science. Unlike in firms where employees are directed by managers, these projects often rely on self-directed contributions from individuals or individuals working between private industry and voluntary F/OSS contributor communities. Popular accounts often marvel at the relatively high quality of products produced from this ostensibly \u2018anarchistic\u2019 approach (e.g. [123], see also critiques from [114, 124, 127]). However, past work has repeatedly shown how this is made possible though less-visible coordination, articulation, and conflict resolution work, done to review, assemble, and align others\u2019 contributions. [18, 31, 46, 69, 83, 124]. On leadership and F/OSS, more work involves predicting who becomes a leader [48, 59] or leaders\u2019 motivations [82], although past work has discussed the roles of leaders, who often resolve conflicts, mentor newcomers, set rules, and organize tasks [4, 30, 74].\n\nIn a literature review on F/OSS within and beyond Computer-Supported Cooperative Work (CSCW), Germonprez et al. [57] note that most studies in CSCW and Human-Computer Interaction (HCI) are about either \u201cinput\u201d topics like developer motivations or \u201cprocess\u201d topics about collaboration and governance. They also detail the massive transformation in F/OSS from past decades. Early work often found that contributors were purely volunteers working in loosely-formalized quasi-organizations, operating more by ad-hoc rules. They cite more recent findings showing the\nrise of paid roles [109], corporate involvement [10, 47, 56, 100], and more formal organizational structures [20, 45] \u2014 from non-profit foundations based on fundraising to revenue-generating business models \u2014 which are increasingly the norm, especially in popular and longstanding projects. Finally, Germonprez et al. note that F/OSS projects are often studied as single-project case studies \u2014 which [32] also find in their 2012 review. However, they discuss how contemporary F/OSS projects exist in \u201ccomplex supply chains\u201d [57, p. 9], with multiple cascading interdependencies with other F/OSS projects in meso-level ecosystems.\n\nThese supply chains are not only to other F/OSS projects. F/OSS projects have complex relationships with the tech industry, governments, academia, and non-profits. Ekbia and Nardi [43] discuss the wide dependency of industrial profit on volunteer or under-compensated labor, as part of a set of practices they call \u201cheteromation.\u201d They argue the global economy relies extensively on digitally-managed forms of un- or under-compensated labor, from user-generated content to microtasking platforms to F/OSS. What has been variously called crowdsourcing [15], cognitive surplus [115], or peer production and the \u201cwealth of networks\u201d [6], Ekbia and Nardi argue can all be understood as heteromation. Computational industries that benefit from this labor are being subsidized by other institutions that support these people working cheaply or for free, ranging from welfare states, universities, family, or charity. While issues of money, financial sustainability, and corporate relationships have long been studied in F/OSS, what is less studied are the lived experiences of F/OSS maintainers as their projects become enmeshed within institutions that can have significant access to financial, social, or cultural capital.\n\n2.2 Infrastructural maintenance labor\n\nScholars have long drawn attention to how the work of maintaining technologies is often ignored and neglected. This scholarship notes the importance of maintenance in shaping the forms and functions of technologies beyond moments of invention [39, 111]. Scholars in Computer-Supported Cooperative Work have long emphasized how less visible, ostensibly \u2018non-technical\u2019 labor is crucial to the functioning of computational infrastructures, especially the \u201chuman infrastructure\u201d in scientific cyberinfrastructure [80]. This work is often underbudgeted because it is unrecognized or undervalued, but necessary to make those systems \u2018seamless\u2019 [107] to their users. Eghbal draws more on metaphors of public infrastructure such as road and other public works to suggests that F/OSS is infrastructure needs to be considered through that lens [41].\n\nFollowing Jackson\u2019s call to take maintenance and repair as the essence of technology [73], empirical studies of such practices have become more common in many areas. One theme is that tasks and roles construed as \u201cmaintenance\u201d or \u201crepair\u201d often involve responsibilities beyond the technological, particularly in maintaining and repairing social and institutional relationships [37, 66, 71, 72, 110]. Infrastructure and maintenance work is also often discussed alongside other work sometimes invisibilized because of gendered or classed assumptions about its nature and importance [44, 51, 70, 117].\n\nFor example, Orr\u2019s ethnography of photocopy repair technicians showed how they can serve as the customer\u2019s primary point of contact with the photocopy company, managing that relationship more than designated account representatives [99]. Suchman\u2019s analyses of expert systems emerging from Xerox PARC also found that computer engineers underestimated the complexity of what secretaries do [121]. However, few in this literature have examined cases in which maintenance work becomes highly visible and even constitutive of leadership, as it can be in large-scale F/OSS projects.\n\nWhile there is often an impulse to always make work visible, other literature show how making this work more visible can come with regimes of surveillance, micromanaging, or self-censorship [14, 97, 120].\n2.3 The many meanings of \u201cscale\u201d\n\nWe are interested in how the work of maintaining F/OSS projects changes as projects scale, but what exactly is scale? In and out of F/OSS, it is common to refer to organizations, communities, and platforms as being \u201csmall scale\u201d or \u201clarge scale,\u201d which often compresses many aspects into a single term. In CSCW and HCI, scale is often a synonym for number of users, where classic work designed systems intended to operate with a pre-defined range of simultaneous users [e.g. 61]. In anthropology, Carr and Lempert [122] argue that when people use terms like \u201clarge scale\u201d or \u201cat scale,\u201d they often are intuiting a kind of synthetic construct that combines multiple related but distinct measures; in our case, like number of users, number of user organizations, kinds of users, interdependence in an ecosystem, number of contributors, and so on.\n\nRecent work in CSCW has similarly identified more multivalent understandings of scale. These include Lee & Paine\u2019s \u201cmodel of coordinated action\u201d [81], in which they identify seven different dimensions along which software-mediated organizations can range: number of participants, number of communities of practice, physical/geographical distribution, nascence of routines, planned permanence, rate of turnover, and level of a/synchronicity in interactions. Our findings relate to both how these specific dimensions emerged as relevant for F/OSS maintainers, as well as their insight that shifts in each these dimensions can occur independently, but shifts in one dimension can also impact or depend on all the others. Studies of scientific cyberinfrastructures more widely demonstrate this theme, where scaling also includes integrating with interdependent projects and standards [3] \u2013 often called \u201cembeddedness\u201d [9, 40, 117] \u2014 and supporting a wider range of use cases over longer periods of time [75]. These studies have shown the various \u201ctensions across the scales\u201d [105] that emerge as projects grow.\n\n3 METHODS AND METHODOLOGY\n\n3.1 Research methods\n\nThis qualitative research is primarily based on semi-structured interviews with 37 maintainers of F/OSS projects in 2019-2020. Interviews lasted a median of 55 minutes and covered a range of open-ended topics, including top-level questions on: the interviewee\u2019s personal history in F/OSS, the kinds of work they do, how their roles and participation in the project has changed over time, governance and decision-making, funding and financial sustainability, motivation/demotivation and burnout, careers, how technologies and platforms impact participation, and work/life balance.\n\nAs is common with non-random sampling in qualitative research, we sought to strategically sample for diversity across many dimensions [95], rather than seek the kind of random uniform representative sample common in survey research. We specifically choose to recruit and interview a broad set of maintainers that varied across geography, national origin, age, employment status and sector, and gender. We made these efforts to sample for demographic diversity, while reflecting on how structural problems such as the gender gap among F/OSS contributors [38] present challenges to recruiting a diverse sample. We did not originally ask interviewees their demographics, but sent a post-interview survey, which 85% completed. For gender, 19% identified as women/female, 81% as men/male, and 0% as non-binary or other. For race/ethnicity (which allowed multiple selections), 72% identified as white/Caucasian (66% exclusively so), 16% as Hispanic/Latinx, 13% as Indian/South Asian, 6% as East or Southeast Asian, 3% as Black/African, and 3% as other. Interviewees were born in 14 different countries on 5 continents; the US was the most common with 47%. Interviewees currently reside in 12 different countries on 5 continents; the US was the most common with 56%. Ages spanned 25 to 64 years old, with 53% aged 30-39 years old.\n\nWe also sampled for diversity to recruit maintainers of different kinds of F/OSS projects. The projects whose maintainers we interviewed range from having a single developer-maintainer to\nhundreds of contributors, and have a similar variance in terms of number of users. Some have existed for decades and have complex governance structures, roles, and norms, while others are relatively new. These projects represent a range of topical areas and locations within the technical stack, including operating systems, programming languages, software libraries, development environments, web frameworks, servers, databases, packaging, data analytics and research computing, devops, electronic arts and media, and more. Our focus on ensuring that our interview pool included maintainers from projects across these dimensions of scale follows from existing work on the importance of scale as a component of ethnographic work on CSCW infrastructures [101, 104] and broader globally-distributed phenomena [86].\n\nOur recruitment methods involved utilizing our existing personal networks, attending F/OSS conferences and events, a call for participation shared on Twitter, and cold e-mailing F/OSS maintainers. We also conducted snowball sampling, asking our interviewees to suggest potential interviewees to us. To help our sampling for diversity, we utilized techniques similar to \u201ctrace ethnography\u201d [55] in our recruitment methods to identify potential maintainers to recruit, based on available user data on social coding platforms including GitHub (see [34, 36, 126]. We identified core contributors through GitHub timelines, recent commits, and release notes. Our interviewees generally self-identified as current or former maintainers on their own terms. We interviewed maintainers who hold various roles within a wide range of F/OSS projects. We particularly focused on projects that have become relied upon as infrastructure by others and either are or began as largely based on volunteer labor. However, as we asked maintainers about all the projects they had worked on, we encountered projects beyond this.\n\n3.2 Methodology and interpretive approaches\n\nOur interpretive approach is grounded in symbolic interactionism, which focuses on how actors organize their interactions with the world, with one another, and with themselves through categories that emerge in their social worlds [12] and in wider discourses [22]. We have transcribed and inductively analyzed the interviews for themes using a grounded theory approach [23, 119], which involves a multi-stage process of coding statements with iteratively-generated themes. These themes identify social processes in common across our organizational site, generalizing across specific, local experiences while remaining bound to the particularities of the cultures and work processes under examination.\n\nAs we conducted interviews, many participants reflected not just on their own practices, but on the practices of others, including their broader theories about the political economy and history of F/OSS [67]. As we move from research to findings, we reflect upon how participants may have related information to researchers based on what they understood the study was about and who the research might affect, and reflexivity as a \u201crecursive\u201d [76] pattern in F/OSS communities [68]. Through member-checking in the form of sharing transcripts and our findings, we gave participants opportunities to give feedback and engage with our interpretations.\n\nOur relationship to these communities is neither purely as outsiders or insiders. Our project was funded by non-profit foundations who also directly fund F/OSS projects. We have all worked as former or current F/OSS contributors or maintainers and have regularly attended various F/OSS-related meetups and events. All the authors are ethnographers embedded in larger ongoing research projects in this area \u2014 either in F/OSS projects themselves or in organizations that rely on and/or contribute to F/OSS projects. The data we present in this paper is centered on our set of 37 interviews, although our broader ethnographic experiences have informed both the kinds of questions we asked and how we interpreted interviewees\u2019 responses.\n4 FINDINGS\n\n4.1 What is a F/OSS maintainer and how does it change as projects scale?\n\n\u201cMaintainer\u201d has a meaning within F/OSS that it rarely has in other technical domains. F/OSS maintainers do perform upkeep and repair work, but the term also usually connotes a leadership role, as [42] also discusses. This leadership role is often enacted through access permissions to the project\u2019s code repositories: becoming a maintainer typically involves being given the technical capacity to make changes to the project\u2019s code. This includes the capacity to approve or reject proposed changes (called \u201cpull requests\u201d in GitHub-style platforms) from non-maintainer contributors, as well as the capacity to moderate the issue trackers where anyone can report a bug or request a feature. Beyond this use of access permissions to formalize maintainer status, the role of a maintainer (and the use of this term) varied widely, which [42] also finds. Like in firms, organizations, and social movements, F/OSS projects range widely in size, scale, complexity, popularity, and interdependence. This makes it difficult and unwise to make overarching generalizations. We instead illustrate how maintainership differs across different kinds of projects, particularly focusing on how the labor of maintaining a F/OSS project changes as projects develop, grow, and scale.\n\nIn projects we encountered with fewer contributors, there was usually one individual in a more singular leadership role, who leads by doing a majority of the work. The most common term interviewees used to refer to such an individual was \u201cthe maintainer,\u201d although interviewees in corporate or academic institutions also noted that they sometimes code switch and use titles like \u201cproject lead\u201d or \u201cproject manager\u201d depending on the environment. In projects with a larger number of contributors, multiple maintainers often shared these responsibilities, sometimes with formal divisions of labor. In such projects, the leadership aspect of being what is sometimes called a \u201ccore maintainer\u201d is analogous to being on a steering committee where decisions are made by consensus or voting [112].\n\nHowever, even in projects with many maintainers, there was often a primary leadership role, typically held by the original creator or someone who took responsibility after the creator departed. One common term for this is the \u201cbenevolent dictator for life\u201d or \u201cBDFL,\u201d although one maintainer with such a role we interviewed described this more as \u201cthe person that tends to feel the most ownership for things that go wrong.\u201d Finally, while most projects we encountered used \u201cmaintainer\u201d to describe these roles, some instead used \u201ccore developer\u201d to signify this dual upkeep-leadership role, as [42] also discusses.\n\nIn the following sections, we identify kinds of tasks that are involved in F/OSS maintainer positions that change as the project grows in interdependencies, complexity, and users. Some of these only become apparent and necessary at certain scales, such as organizing events or coordinating with other F/OSS projects. Other tasks occur at all scales, but can become quite different at larger scales, such as providing support to users, fixing bugs, and developing new features. We do not intend this to be a comprehensive survey of maintenance work in F/OSS, and so focus on the relationship between scale and labor.\n\n4.2 Maintaining Users: User support\n\nWhen we asked our interviewees about the work of maintaining their projects, user support was a major topic. For the maintainer of a new project with few users, the first user who asks for help or raises an issue is a sign of validation and success. F/OSS projects are meant to be used, and a common attitude from maintainers of smaller projects was that whether the user\u2019s issue was due to their misunderstanding or a bug in the software, the maintainer can learn something from it. Maintainers told us how some users who ask for help become contributors and co-maintainers, or alternatively donors and patrons. Yet our interviewees who maintained large, well-known projects\nwith many users identified user support as an overwhelming and never-ending chore, particularly for projects that use GitHub-style collaboration platforms. One interviewee stated that \u201cuser support is something I do very regularly during the evenings during the week, or during the weekends. That actually takes a large chunk of my free time.\u201d For these maintainers, user support is an around the clock reality of their position.\n\nRequests for user support come through many channels, including messages sent to their private e-mails and social media accounts. While users often seek software help in Q&A sites like StackOverflow, a project\u2019s maintainers are not generally obligated to be present in those spaces, although some do [134]. GitHub has allowed any user on the web with an account to open an issue for a F/OSS project hosted on that site. The number of open issues, potentially numbering in the thousands, is prominently displayed on the project\u2019s landing page, creating reputational pressure for the project. Managing and triaging the issue queue was often identified by our interviewees as a major task for maintainers of large-scale projects, although there was variation in the level of obligation. Some stated that maintainers may not have an obligation to actually fulfill the requests in the issue, but did have an obligation to respond and acknowledge the issue in a timely manner \u2013 sometimes described as within 24 or 48 hours, though others said that a week or more was acceptable.\n\nEghbal suggests maintainers engage in the \u201ccuration\u201d of user and contributor interactions under intense time pressure [42]. As projects grew, larger projects often implement rules, recommendations, and even templates for raising issues. For example, it is common for larger projects to actively discourage using issues to request assistance in using properly-functioning parts of the software. However, a common tension arises when users report what they experience as bugs, but what the project\u2019s contributors and maintainers see this as the software operating properly. Maintainers of large or central projects we interviewed told us about users who were \u201cdisrespectful,\u201d \u201centitled,\u201d or \u201cdemanding,\u201d of their time and attention. This is also a growing topic of public discussion within F/OSS, with several talks and articles about how to be respectful to maintainers [19, 62, 79, 113].\n\nAs the work of investigating, triaging, and resolving issues intensified, maintainers reported feelings of demotivation, exhaustion, and burnout. This was especially common for maintainers of larger projects, who often mentioned user support as one of the more emotionally-intensive aspects of being a maintainer. As one interviewee discussed: \u201cI think burnout can come from a lot of different things. It can come from constant bombardment of issues and notifications and you\u2019re constantly reminded of the things that you\u2019re not doing.\u201d Several interviewees noted that the way users interact with contributors and maintainers was crucial, with a few kind words or less-demanding phrasing going a long way for maintainers, which [131] also finds. However, this can be complicated in the global landscape of F/OSS, with several interviewees discussing cross-cultural or language barriers.\n\nAs projects scaled, interviewees described how reciprocity affected how they felt about F/OSS work. One interviewee stated that a top priority for them would be when the user requesting support is another F/OSS contributor in a related project seeking to fix a genuine bug that affects the ability for the two projects to be used together. Interviewees also expressed enthusiasm for supporting educational institutions, if an educator was having issues as part of teaching the software in their class. In contrast, maintainers we interviewed expressed frustration at the user support work generated when a large tech company integrated the F/OSS project as part of software they were building and selling \u2014 particularly if this company had not \u201cgiven back\u201d through financial donations or in-kind donations of labor (e.g. their developers regularly contributing to F/OSS projects). One interviewee noted that demanding free-riders are not limited to for-profit corporations, as some academic researchers had behaved with similar attitudes. The difference between how maintainers framed their experiences as collective work or exploited labor related to\nthe social and communicative relationships maintainers had to those with whom they collaborated, coordinated, or contributed.\n\n4.3 Maintaining \u201cMainline\u201d Code, Scaling Trust\n\nF/OSS maintenance is not only about repairing and fixing. It is crucially about updating and changing to stay relevant. As the project grows, users expect a canonical version of the software, even as the number and diversity of contributors and user needs might expand. As contributors scale, maintainers must devise ways to scale trust. Version control practices are central to managing changes, especially with many contributors. The open development model of contemporary F/OSS projects typically involves a contributor making their own copy of the entire codebase, making whatever changes they see fit, and then submitting their modified version for review, approval, and merging. Traditionally, a maintainer decides what patches to accept and keeps a canonical version of the source code, regularly making public releases to keep the rest of the project up to date.\n\nSmaller projects typically begin with a single maintainer, but as they begin to get more and more proposed contributions, some solo maintainers give a regular contributor commit rights, maintainer status, and let them manage specific releases. However, the founding maintainer must trust this new maintainer, because by default, they have the full technical privileges to accept any proposed changes. In one case mentioned in our interviews, a solo maintainer had been unable to spend as much time maintaining a project. When someone they did not know asked to be a co-maintainer, they happily accepted. However, this new maintainer added code to the software that silently used users\u2019 computers to mine cryptocurrency and deposit the profits in their account.\n\nAs projects scale the number of maintainers, code review processes are a common way of producing trustworthy code. Code review is the process by which one or more designated individuals who did not author the change have to approve a pull request before a maintainer can accept and merge it. The process is somewhat similar to academic peer review, especially in that there can be many cycles of review and revision between the code reviewer(s) and the original author. Code reviewers typically read through each line of code for specific issues, with contemporary social coding platforms supporting fine-grained line-level comments. Code reviewers typically look for bugs and inefficiencies, plus conformity with the project\u2019s code style, naming conventions, and approach to modularity. In some projects, only maintainers can do code review, but others allow a wider set of trusted non-maintainers to participate.\n\nIn smaller projects, code reviews might be informal and implicit, but formally specifying such rules can be a crucial aspect of scaling a project\u2019s number of contributors, code reviewers, maintainers, and codebase. As projects grow, maintainers have to devise ways distributing the work of review so they do not have to recheck submissions to the mainline or canonical version. For example, the Linux kernel still uses this mode of development, in which only its creator and lead maintainer Linus Torvalds can accept patches to the official \u201cmainline\u201d codebase. This was much easier when the project was much smaller, but it has since grown to many thousands of contributors. The project has developed a cascading \u201cchain of trust\u201d [28] where a top tier of subsystem maintainers are responsible for various sections of the codebase, making the decisions about what patches to accept. Some subsystem maintainers delegate responsibility further or have their own processes for making decisions about their part of the codebase. As the Linux kernel\u2019s documentation describes:\n\n\u201c...top-level maintainers will ask Linus to \u2018pull\u2019 the patches they have selected for merging from their repositories. If Linus agrees, the stream of patches will flow up into his repository, becoming part of the mainline kernel. The amount of attention that Linus pays to specific patches received in a pull operation varies. It is clear that,\nsometimes, he looks quite closely. But, as a general rule, Linus trusts the subsystem maintainers to not send bad patches upstream.\" [28]\n\nVersion control and code reviews may seem purely technical, but they express the direction of the project. Authority to merge changes often means authority to set and enforce a specific vision of the project. Because of the necessity of keeping a single canonical code repository in this more traditional approach, the model of having a single lead maintainer who is ultimately the final decision-maker became widely prevalent in F/OSS, known as the \u201cbenevolent dictator\u201d or \u201cbenevolent dictator for life\u201d (BDFL). In one interview, a maintainer described a tense environment caused by the transition from a BDFL model to a more democratized system of decision-making. In essence, interpersonal relationships were strained when maintainers sought to democratize leadership roles within their project, after contributors felt their own speed and progress was limited by the power of the BDFL to veto group decisions and the BDFL resisted change.\n\n4.4 The Labor of Managing and Maintaining Donations of Labor\n\nThose who contribute code to F/OSS projects are donating the products of their labor, but those donations also generate new work for maintainers. When discussing the subject of what changes to merge, maintainers of projects with more contributors told us about perceived mismatches in expectations from non-maintainers who made the proposed changes / pull requests. In these cases, a non-maintainer added or expanded the code in a way they found useful. They contributed this code through a pull request, feeling they have generously donated time, effort, and intellectual property. However, from the maintainer\u2019s perspective, the new pull requests were a heavy obligation, both in the time to review them and the long-term costs in maintaining this code indefinitely. Wiggins [132] discusses a similar trend in citizen science as \u201cfree as in puppies,\u201d where such donations commit the recipient to care for it for years, even if the value of the contribution is uncertain.\n\nAs such, our interviewees who maintained F/OSS projects with more contributors mentioned the importance of rules requiring that new code follow certain standards that make the code easier to review and maintain. Maintainers we interviewed from projects with more rules and procedures around merging changes told us about cases in which the contributor became increasingly frustrated with what the maintainer was asking them to do in order to approve the proposed changes. In some cases, the contributor abandoned their contribution and the project altogether. In addition, sometimes the pull request perfectly conforms with all rules, but would take the project in a direction that the maintainers have decided is out of scope, and is thus rejected:\n\n\u201cAs a maintainer, you don\u2019t just merge things. You also try to be a thought leader for what\u2019s happening and why you\u2019re going to go a certain direction or not \u2014 and being able to politely say \u2018We don\u2019t want to go that way\u2019 for things you know you don\u2019t want to. That\u2019s the hardest thing, I think, because you can have situations where someone is earnestly trying to add something and you don\u2019t want to shut them down, but sometimes it\u2019s something that was declared you weren\u2019t going to do in this project...\u201d\n\nBecause of the open contribution model in which anyone on the web can propose new changes, the work of being a maintainer of a F/OSS project typically involves a substantial amount of labor in managing the labor and even emotions of others. For our interviewees, this emotional labor was one of the more difficult and draining aspects of their position.\n\n4.5 Scaling through automation: continuous integration, build systems, and testing\n\nAs projects grew in contributors, \u201ccontinuous integration\u201d (CI), build systems, and code testing were ways of automating code review and testing. CI involves automated processes that build a project\u2019s codebase across multiple platforms, then run pre-written, scripted tests to check if it is functioning...\nAutomated \"linting\" practices even check for proper formatting and style conventions within projects. CI services are directly integrated into GitHub-style platforms, which is one of the many ways bots and software automation can govern and transform virtual organizations [54, 107]. A 2016 survey found that 40% of the 34,000 most popular F/OSS projects on GitHub use CI, rising to 70% when examining the 500 most popular projects. [63]\n\nThe number of CI tests that are run can be staggeringly large. In the Python programming language\u2019s standard math library, there are currently 134 different \u201cunit tests\u201d that check the inputs and outputs of the square root (sqrt) function alone. Major software libraries for programming languages can have tens of thousands of tests, and programming languages themselves can have hundreds of thousands of tests. These can be computationally intensive, a point we will return to.\n\nFor many maintainers of projects at all scales, continuous integration, build systems, and testing was a major strategy they relied on to automate the labor-intensive and interpersonally-intensive tasks of code review. This was particularly the case in rejecting code requests:\n\n\u201cThe other thing we\u2019ve found is that if the computer tells them it\u2019s wrong, they take that better than if a human does. So, the more automated things you can do that will catch low-hanging fruit, the less offense it causes people. So [the] computer says, \u2018you\u2019ve got a tab instead of spaces here,\u2019 they don\u2019t mind that. But if someone tells them that, they get grumpy about it.\u201d\n\nMaintainers also described CI and testing as a strategy they relied upon to help them manage their workloads. In several interviews, when we asked about avoiding burnout, work/life balance, or advice to give to new maintainers, these strategies were the first responses. One interviewee who helps manage a large ecosystem of projects discussed how they require these kinds of measures for all projects in that ecosystem, such that \u201cthere are packages that I maintain that I have not updated in two plus years, because things just work. When something breaks, I will get an email.\u201d\n\nLike all automation, these strategies redistribute and generate new forms of labor. Tests must continually be written and updated, especially when new features are added. Some projects require that new functions or features cannot be added without also adding appropriate levels of testing. Those gifts that create labor could be mitigated when they come with testing. As projects become increasingly complex, however, new kinds of integration tests must be written to check that the various subsets of the codebase work together. Testing also grows as projects become more integrated within an interdependent ecosystem of projects, all depending and relying on each other. It is increasingly common for projects to test that proposed changes do not break anything in other projects in the ecosystem. Although developing and maintaining these CI processes is itself a labor-intensive process, the work can be distributed to contributors who create automated tests for the code they write, rather than code reviewers responsible for catching bugs. In this way, it can help distribute maintainer labor out more widely as the project scales in terms of its codebase, featureset, contributor base, and interdependence within a software ecosystem.\n\nYet automation is computationally expensive, which can challenge F/OSS values that privilege relying on free and open platforms. A 2017 blog post about testing in the Rust programming language [2] reported that over 126,000 total tests are run for each proposed change/pull request across 20 different software configurations, with each taking 2 hours of computing time. The post references that the project only has resources to run one or possibly a few tests at the same time. Each additional pull request adds to the queue, meaning that contributors may have to wait for days to see if their proposed change breaks the testing suite. The author describes how longer queues can lead to conflict between contributors, who all need the CI system to approve their changes before they move on to the next stage of code review and approval.\n\n[2]https://github.com/python/cpython/blob/master/Lib/test/cmath_testcases.txt\nThe only way for CI in some of these projects to scale up is to use either commercial cloud computing or a self-hosted server cluster. Contributors are supposed to run the full test suite on their own computer before submitting pull requests, but it is important for the project to test it in a wide range of configurations, as well as have a common public infrastructure that verifies the tests actually passed. GNU GCC, part of the free software movement, maintains their own distributed \u201ccompile farm,\u201d with donated servers hosted by those in the movement.\\(^3\\) Commercial CI platforms have been growing, including Microsoft\u2019s Azure Pipelines (which runs on Microsoft\u2019s cloud computing infrastructure) and services from the venture capital funded companies CircleCI and AppVeyor (which run on cloud computing infrastructure from Google, Amazon, or Microsoft). These commercial CI platforms often give public F/OSS projects a free single CPU to run a single test at a time (Microsoft Azure currently gives 10 free simultaneous tests to F/OSS projects), but charge for more simultaneous tests \u2013 a necessity as projects grow in complexity. These commercial CI infrastructures challenge open source cultures that privilege creating autonomous, freely available infrastructures using freely available infrastructures, what anthropologist Chris Kelty has called a \u201crecursive public\u201d [76].\n\nThe decision to go beyond the free tier of CI services can be the first time a F/OSS project takes on a recurring financial expense, driving organizational changes and work. For smaller projects, free tiers are often sufficient. As projects grow their codebase and contributor base, they have to decide whether to fundraise to pay for more simultaneous tests or to deal with the strains of not being able to easily verify that proposed changes do not break the project. For those who fundraise for more CI resource, this can require projects to add accounting and financial roles (although events are more often the first time this occurs, which we discuss in a later section). Even the non-commercial, self-hosted alternative that projects like GNU GCC follow also require dedicated maintenance roles and the soliciting of donations to fund the self-hosted compile farm.\n\n4.6 Ecosystem work: from interdependence to competition\n\nOne major dimension in which F/OSS projects scale is their interdependence on other F/OSS projects, which can take a variety of forms. First, they can become relied upon as critical infrastructure by other F/OSS projects. This is especially the case for software libraries, programming languages, and operating systems. Here, the number of users typically means the number of other developers building software using the F/OSS project. The chain of cascading dependencies can grow quite complex: a program may only rely on a few explicitly imported dependencies, but those projects and their dependencies can make the chain hundreds of projects long.\n\nMaintainers must manage relationships with projects that are both \u201cupstream\u201d (that they depend on) and \u201cdownstream\u201d (that depend on them). If a project modifies its features, it can break downstream projects that are expecting this feature to work consistently. This is one common use for continuous integration, in which a project will regularly test its own functionality on beta or release versions of upstream projects it relies on. There are many issues that arise in interdependent software ecosystems beyond these more fine-grained issues around compatibility between new versions of software. One maintainer shared the difficulties that arose when a project they depended on began to face internal conflict, which forced them to either pick a side or do additional work to maintain compatibility with two projects.\n\nComplex and interdependent F/OSS software ecosystems often find themselves needing to coordinate on high-level tasks and decisions. Conferences and conventions can be a major site to do this work. Some interviewees even described these ecosystem-level conferences in similar terms as political delegations, such as referring to a perceived need to send representatives. Some\n\n\\(^3\\)https://cfarm.tetaneutral.net/\nhave large blocks of time dedicated to open discussions on topics relevant to projects across the ecosystem. Such ecosystem-level topics include more software-specific issues that would require consensus to implement new features across an ecosystem, such as issues in such as packaging and release managers, data types, hardware support, or user telemetry.\n\nAnother major, perennial, and often controversial ecosystem-level topic is the proposed consolidation of related or competing projects within the same ecosystem. It is often common in F/OSS ecosystems for many projects to be created that solve a similar problem. There may be good reasons for multiple related competing projects to circulate in the same ecosystem, but navigating a crowded ecosystem can be confusing and frustrating for both users and developers. In such ecosystems, it can be common for someone to suggest there are too many competing packages and there needs to be a consolidation.\n\nOne interviewee shared a case where, during an open discussion session at an ecosystem-level conference, the maintainer of one project declared that some of the other competing projects in this ecosystem \u201cneeded to die,\u201d seeking to gain support for consolidation then and there. No consensus was reached, but it raises the issue that high-intensity communicative work and representation at such meetings becomes a solution to some vying to keep projects alive within an ecosystem, and to secure funds. A project\u2019s maintainers have can certainly decide to keep their project operating in an ecosystem that has decided to consolidate around another competing project, but they will find themselves with fewer and fewer users.\n\n4.7 Growing a Community by Evangelizing\n\nAt a developer conference we attended for a particular subset of a F/OSS ecosystem, we observed a dedicated plenary session for 2-4 minute lightning talks, which were almost all pitches for a F/OSS project the speaker had developed. Many projects pitched were newer, less-established, and had not secured spots at the competitive conference, which some speakers noted. Maintainers asked for others to use their projects, with one explicitly imploring the audience to \u201crely upon\u201d their project: to integrate it into their workflows and F/OSS projects. We asked about the role of this ritual: Why is it important to have a space for maintainers to convince others to use their often-fledgling F/OSS projects? And why did so many of them take the form of a \u201crely upon me\u201d pitch, particularly when their project was not quite yet fully developed?\n\nOur interviewees called these \u201crely upon us\u201d pitches \u201cevangelizing\u201d and they brought it up frequently, in response to questions about scaling or even about general strategies for maintaining a project. In computing in general and F/OSS specifically, evangelizing is widely-used to describe efforts to sustain and maintain projects by bringing in more people [1, 85]. Maintainers repeatedly told us how important it was to have fellow contributors and maintainers to distribute the work and make their project sustainable. A key rationale was that users who rely on projects presumably become invested in those same projects\u2019 success \u2014 particularly when those users are also F/OSS developers or well-resourced organizations. When a project is used and relied upon by programmers and software firms, it gains access to a potential pool of skilled labor and resources. More contributors and users also make projects appear more successful [5], and thus worthy of funding by entities who fund F/OSS. Like many startups, projects often signal their credibility through showing the logos of well-known companies and universities that rely on their projects on their websites.\n\nThe tasks which constitute this vast domain of evangelizing can include: developing social media accounts for F/OSS projects, maintaining educational resources and documentation, building and updating websites with F/OSS project information, moderating and building Q&A sites and forums, and giving talks at meetups, conferences, companies, and schools. The promotional communicational, educational, and evangelical activities done by maintainers F/OSS lay the conditions for\nfurther expansion and infrastructural change \u2014 and more maintenance work. Some interviewees said they deeply enjoyed evangelizing work, while others described it as an exhausting task outside of their expertise. We also heard that highly-visible evangelizer-maintainers can receive personal credit for all the group effort in a project. This Matthew effect [93] of accumulated status can generate tensions in the project, even if they do not intend for this to happen and actively work to elevate other contributors and maintainers.\n\n4.8 Building and Maintaining Relationships: Meetups and Events\n\nIt is common for F/OSS projects to hold in-person meetups and events, often to get new and/or existing contributors together to accomplish work and build relationships. These events vary widely, from conferences to hackathons to happy hours. Our findings align with existing work that has also found these events play a critical role in developing trust and maintaining positive, lasting social relations [90, 91, 104]. Maintainers described in-person events as having an essential function and critical value in helping maintainers build good relationships, which helps them better understand each other in virtual environments [26]. Several longtime maintainers of major projects told us stories of their first F/OSS event, which they claimed inspired them to get even more involved in F/OSS in a way that years of online conversations had not. Past work has described the gendered labor women take on to organize events that often goes unacknowledged when technical contributions are valued above social and organizational work [92], which our participants also reflected upon.\n\nProjects with many contributors, resources, and institutional connections often run their own major conferences, while smaller projects often meet up in more ad-hoc events. Smaller projects also rely on ecosystem-level conferences, in which those from various related projects (e.g. those written in the same programming language or that serve similar purposes) organize collective events. These ecosystem-level conferences often have dedicated periods for projects of various sizes to hold their own events. As projects grow the number of contributors, many move from holding satellite events before or after major F/OSS conferences to their own conferences.\n\nIn both our interviews and our observation of these events, we found that maintainers were often key organizers at events for smaller projects, although many larger projects with more capacity to fundraise hire dedicated event organizers. Projects that make connections to companies or universities often get in-kind donations of space and event organizing labor. Major projects with many users, contributors, and maintainers hold events that are more like trade conventions, with thousands of attendees and high competition for speaking slots. Some companies even specialize in hosting F/OSS conferences on behalf of projects. The companies then take a portion of the registration fees, which for larger projects can be over $1000 USD.\n\nInterviewees told us it was often easier to get companies to fund events than anything else \u2013 testing infrastructure, time for labor, or the other costs that can accrue for a project. Many ecosystem-level F/OSS events are directly sponsored by companies that either rely extensively on F/OSS projects or are business arms of F/OSS projects. Maintainers discussed the labor associated with events as projects scale, as holding events for more and more people becomes increasingly difficult and costly. We also heard struggles maintainers face when the project has users, contributors, and maintainers across the world, which is another form of scaling that is often seen as a key metric of success. In-person events for a global community require more work, skills, and resources, including tasks like visas and fundraising for travel grants.\n\nSome maintainers we interviewed shared how they spent significant amounts of their personal money on events they organized, particularly when funding promises fell through or when costs exceeded budgets. While expenses were sometimes reimbursed through donations, the actual labor of organizing events was less likely to be compensated or recognized. Some maintainers took issue\nwith restrictions from donors who would not fund stipends or even travel funds for those in their projects who organized an event, even if excess funds were available for expenses such as attendee travel, venues, and catering.\n\n4.9 Funding, finances, and donations\n\nFunding, finances, and donations were a major topic in our interviews, which has become the widely-discussed in broader public conversations in F/OSS. Our interviewees described funding as a way to compensate for labor already performed in maintaining F/OSS projects, as well as to pay contributors to perform tasks that are not done voluntarily. However, we found that fundraising and fund management itself can involve substantial amounts of unanticipated and specialized labor, including seeking funding, writing proposals and budgets, accounting, reimbursements, and managing relationships with funders. There is a strong parallel here to other non-profit sectors, including academic research, charities, and political organizations, where the work around funding can become a substantial fraction of the work performed in maintaining an organization.\n\nIn academic research, however, scientists learn through their training that running a lab means maintaining a funding pipeline through networking, applications, and rejections [96]. Maintainers not trained in this system can be caught unprepared by the mismatch between their vision for the project and the reality of getting funding. Even for those who do not receive funding, the mere availability of potential funding can shift maintainers\u2019 and projects\u2019 conceptions of what F/OSS is and how their life and work fits into it. Funding can also push projects to develop more formal organizational structures, which in some cases can be at odds with their existing governance style and ethos.\n\n4.9.1 Fundraising for maintenance: patronage and business models.\n\nWe found two general approaches maintainers took to funding: patronage models and business models. In patronage, maintainers solicit donations or grants, while business models involve a range of strategies to sell services on top of F/OSS projects. While it may seem more obvious that building up a business involves a substantial amount of work and start-up costs, patronage models also can involve massive work in getting patrons and maintaining a good relationship with them. Companies, foundations, governments, and individuals all donate, but can have idiosyncratic processes around those donations. Managing the patron relationship can be a complex task as projects gain more patrons, particularly if patrons have contradictory expectations.\n\nIn our interviews, maintainers for projects that had funding to regularly hire multiple full-time employees expressed a common sentiment: they found themselves doing less and less work on the software project itself, and more and more work on seeking funding and managing the project. While we especially saw this with maintainers of large-scale projects in academic settings, we also encountered it in non-academic projects. Interviewees who actively sought grants and patronage told us that both grant agencies and other patrons often only fund novelty and new features, not the necessary upkeep, repair, security, and compatibility work. Maintainers struggle with producing the visions of novel innovation needed to get funding, which is a common theme in scientific cyberinfrastructure [9] and public works [39, 111].\n\nThis work around soliciting funding is not just about the raw amount of time or energy that maintainers spend trying to write proposals or find interested donors. There can be a heavy personal burden when maintainers become responsible for the livelihoods and careers of real people they have hired. One of our interviewees \u2014 an academic researcher whose grants fund employees to work on F/OSS \u2014 explained how getting funding can create an obligation to continue to get funding, to keep supporting people they hired. We particularly heard this in more academic-aligned F/OSS projects, where hiring graduate students or postdocs to work on F/OSS projects is common.\n4.9.2 Money changes everything: the labor of spending. Once funding has been obtained and money is in some kind of account, the question of distribution and governance arises. Smaller projects grapple with learning how to navigate non-profit and for-profit laws around hiring, accounting, and taxes, requiring the project to bring in more kinds of expertise. As projects fundraise, maintainers can find themselves with obligations to expand their decision-making to include funders or those chosen by funders. This can be informal for smaller projects, but become more explicit as projects scale their fundraising. For example, the Linux Foundation has a Platinum membership level that costs $500,000 USD annually, and its corporate charter holds that about 80% of its Board of Directors are chosen by the Platinum members [50].\n\nWith projects that receive grants from more traditional foundations (whether private or public), the grant proposal already specifies how the funds will be spent. However, many projects receive more ad-hoc funding from donors who do not require extensive budgeted proposals, especially those that solicit funding through Patreon-style platforms like OpenCollective or GitHub Sponsors. As one maintainer explained, getting the funds was the easiest part:\n\n\u201c...we created an OpenCollective, and a bunch of companies have contributed to it, but we didn\u2019t really address the issue of how to disburse the funds. [...] No real system for figuring out how to spend it. [...] Do we pay them [contributors] money for that one pull request that they did? [...] The problem doesn\u2019t solve itself just by the existence of money that\u2019s available for the project. There still has to be a mechanism and a policy for, like, distributing it among the people on the project.\u201d\n\nThe introduction of money into the project can bring the social relations of collaboration into conflict with labor and trade agreements. F/OSS projects often try to hire their long time volunteer contributors, no matter where they live. This means navigating through labor laws, varied immigration statuses, banking networks, or sanctions, which can be far more restrictive for some kinds of contributors than it is for others. Funding, then, transforms the structure of the organization, the possible formations of open source community, and what kinds of collaboration can sustain the maintenance of the project.\n\n5 DISCUSSION: LABOR AND SCALE IN MAINTAINERSHIP\n\nOur findings speak to two distinct but linked issues in F/OSS: labor and scale. Before we discuss more specific implications of our findings, we reflect about what we mean by the multi-faceted term \u201cscale.\u201d Through our interviews about topics of labor, it became apparent that scale was clearly important, introduced by interviewees in response to a wide range of questions about many different aspects of their work and positions. Based on our interviews, \u201cscale\u201d can refer to: the number of people who use the software; the use of the software within large and/or prestigious organizations; the number of contributors or maintainers in the project; the number of bug reports, issues, and/or proposed changes made; the geographic distribution of users, contributors, and/or maintainers; the amount of rules and governance procedures; the number of communication channels used; the amount and/or rate of internal and external communication; the size, complexity, or features of the software code; and the interdependence of the code within a broader software ecosystem. Scale was also invoked as a more holistic feeling, particularly from those who described how they felt their project had grown too much too fast, making scale closer to a signifier of affect, as [122] also describe. Our findings advance work that interprets scale as a multidimensional quality beyond the number of users/participants [81], and methodologies that use participants\u2019 multiple understandings of scale as an analytic resource [86, 104].\nTable 1. Summary of forms of work with examples of how they change as F/OSS projects scale.\n\n| At smaller scales | At larger scales |\n|-----------------------------------------------------------------------------------|----------------------------------------------------------------------------------|\n| **The maintainer(s)** | Many maintainers with various divisions of labor, hierarchies, and organizational structures. |\n| Solo or lead maintainer who makes all or most decisions, often by doing most of the work on their own. | An overwhelming flood. Work has established rules, teams, and triaging. |\n| **User support** | |\n| An opportunity to retain and recruit new contributors. Work is ad-hoc. | An overwhelming flood. Work has established rules, teams, and triaging. |\n| **Managing software development** | |\n| Governance is often implicit and led by a lead/solo maintainer, who accepts or rejects all proposed changes. | Governance is often explicitly discussed, with a variety of formal rules and structures for decision-making. |\n| **Code review and testing** | |\n| Either no automated tests or lightweight tests managed by the lead/solo maintainer. | Widespread use of tests to review proposed changes and enforce rules. Managing testing is a dedicated role. |\n| **Ecosystem-level work** | |\n| Projects may rely on other more-established projects and have to adapt to changes made \u201cupstream.\u201d | Projects are embedded in an interdependent ecosystem, which must coordinate to ensure compatibility. |\n| **Evangelizing** | |\n| A crucial task to get new users and contributors. Lead/solo maintainer must work to get speaking spots. | Maintainers are routinely invited to speak at conferences and prestigious organizations, some are celebrities. |\n| **Meetings and events** | |\n| Smaller events focused on growing the user and contributor base, often organized by the lead/solo maintainer with little financial support. | Larger events that let contributors and maintainers coordinate and build relationships. Dedicated organizing roles with financial support. |\n| **Funding and finances** | |\n| Small to non-existent. All work is uncompensated, but projects may receive donations for small expenses. | Routine and successful enough to hire contributors, maintainers, and accountants. Debates over how to raise and spend funds. |\n\n5.1 As projects scale, work not only increases, but fundamentally changes\n\nAs we summarize in Table 1, various kinds of work and positions of labor in F/OSS can become quite different at smaller and larger scales. Our findings extend prior work that investigates the different modes of scaling in technologically-mediated organizations and scientific cyberinfrastructure [3, 9, 75]. These similarities are potentially due to the fact that many F/OSS projects are also relied upon by decentralized communities (including science), where well-resourced user organizations and grant agencies contribute to their development and maintenance in a more ad-hoc fashion.\n\nCarr and Lempert discuss how scale is not merely a matter of existing activities being amplified, but of work being fundamentally transformed by scale, because scale is deeply linked to power relations [122]. As F/OSS projects grow across many different understandings of scale, we showed how the kind of work involved in maintaining them also changes. For instance, in the findings we referred to an interviewee who shared how the growth of contributors to their F/OSS project necessitated the formalization and democratization of leadership positions when the \u201cbenevolent dictator\u201d model was no longer sufficient, because it required all decisions to be approved by one person.\n\nIt is not simply that as projects grow, there is more work to be done, although this is also the case. New kinds of work are often needed and existing kinds of work become transformed. For\nexample, for a project with few users, providing user support to someone who raises an issue can be an exciting opportunity to grow the userbase. The sole maintainer likely does this work themselves, and has the capacity to attend to individual concerns. As F/OSS projects gain thousands or even millions of users, maintainers often must implement distributed approaches, like directing questions to Q&A sites or forming teams who solely triage the issue queue. This is also the case with continuous integration (CI), where when projects grow their codebase, interdependence, and contributors, more tests must be run, which exceed the free allowances of commercial CI offerings. This may initially seem to be a purely \u201ctechnical\u201d challenge, but raises questions about fundraising and organizational roles.\n\nThe CI example also illustrates the deeply socio-technical nature of work, which has long been an established concept in CSCW and organization studies [7, 98, 108, 133]. Our findings extend the literature on how maintenance and repair practices are bound up in social relationships [37, 66, 71, 72, 99, 110, 120]. One way of understanding this implication is that there is no \u201cpurely technical\u201d work in F/OSS that only requires software engineering expertise, as all forms of work have interpersonal and organizational dimensions, even if those are often implicit.\n\n5.2 Scalar labor: What is needed to grow in many directions?\n\nMany F/OSS projects have a small user base and do not grow beyond a single maintainer-contributor [42], but as the prior section discussed, those that do become widely relied upon as infrastructure must be maintained with increasingly more work and different kinds of work. When this occurs, the maintainer(s) must constantly ensure there are enough people available and willing to work on what the project needs. Those people must also have the skills, resources, institutional knowledge, and organizational forms necessary for them to do that work well. We introduce the term \u201cscalar labor\u201d to describe these kinds of work that seek to ensure the project has capacity to meet its many growing needs, across the many dimensions in which the project may scale. We use \u201cscalar\u201d primarily as an adjective form of scale, but its mathematical meaning as magnitude without a specified direction can be an apt metaphor in F/OSS, particularly for projects that achieve what one interviewee called \u201ccatastrophic success.\u201d\n\nThe concept of scalar labor overlaps with Ribes\u2019s focus on \u201cscalar devices\u201d [104] in his studies of scientific cyberinfrastructure, which are the tools and practices that people in organizations use to understand and manage the size, scope, and spread of the organization. These include surveys, all-hands meetings, analyses of logs and digital traces, and other \u201clittle technologies of community.\u201d Like other works on scaling, Ribes discusses the many heterogeneous dimensions that people tend to compress into the single term. While Ribes\u2019s article is more methodologically focused on how ethnographers can study such organizations through scalar devices, we also found many of the same empirical findings in our ethnography of a similar kind of process, in a different set of social worlds.\n\nIn both F/OSS and scientific cyberinfrastructure, there can be a prevalent assumption that scaling-up is an inherent good, which is rewarded by funders who support projects that demonstrate successful scaling. Ribes also discusses how it can be far more difficult to manage projects as they seek to scale-up, particularly when scaling into becoming infrastructure for an entire academic discipline. Ribes\u2019 \u201cscalar device\u201d is what sociologists call a sensitizing concept [11] that draws our attention more to knowing what scale an organization is at now, what it could be at in the future, and how those within it know the organization. Scalar labor, by contrast, draws our attention more to how this work transforms with a changing organizational, economic, and institutional context. Labor also calls attention to who does this work, who is recognized for doing this work, what it costs them, and how they are compensated or made whole for doing it.\nScalar labor is also related to Strauss\u2019s concept of \u201carticulation work\u201d [118], which Gerson summarises as \u201cmaking sure all the various resources needed to accomplish something are in place and functioning where and when they\u2019re needed\u201d [58]. Bietz et al.\u2019s study of scientific cyberinfrastructure [9] introduces a related extension of articulation work in \u201csynergizing,\u201d which is the work of creating and maintaining a common field where quite different kinds of people, organizations, and systems can do articulation work. Synergizing calls our attention to how this work is impacted by the heterogeneity of interdependent people, organizations, and systems that must be coordinated, which was also certainly a theme in our findings. By emphasizing the labor dimension of F/OSS, scalar labor covers a similar range of activities as synergizing, but draws our attention to how this work is specifically impacted by the heterogeneity of different dimensions of growth, of which interdependence is but one mode of scaling.\n\nLike articulation work and synergizing, scalar labor is complex and interdependent. A prime example of these phenomena is raising funds to host an event to evangelize to new users, who would then be mentored into contributors, who would then respond to bug reports and fix identified issues, and may even become mentored into maintainers themselves. In this example, a traditional software engineering firm that made money from selling licenses or services would simply hire someone directly to respond to bug reports and fix such issues. Some F/OSS projects with significant fundraising capacity do exactly this, which can relieve major burdens. However, most projects we encountered could only recruit volunteers, because they cannot charge for the free software and also struggle to achieve the status that would help them fundraise. In either case, the concept of scalar labor draws our attention to how growth is not sought for its own sake, but rather as a strategy to build capacity so that important maintenance work can be done. Yet because this growth itself can bring more and different kinds of work, even more growth may be needed to do that work.\n\nAlso like articulation work and synergizing, scalar labor is a useful concept for studying F/OSS (and other organizations that produce and maintain infrastructure) because it includes interpersonal, organizational, and financial skills that are often far outside of the scope of a traditional engineer\u2019s duties \u2014 even though it is done to improve the project\u2019s capacity to do traditional engineering work. Yet this work also often requires project-specific knowledge and the trust of contributors, which makes it difficult to delegate. This work can becomes what maintainers call \u201cgovernance\u201d \u2013 a longstanding topic in F/OSS research (e.g. [87]. Yet despite such a focus on governance, governance work is rarely analyzed as a form of labor, perhaps because it is seen to be more about organizational forms or decision-making. Kelty [76] is notable exception as he details how free software projects often make and enact governance decisions through software engineering, and they recognize those engineering decisions as constituting social values. The concept of scalar labor calls our attention to governance as a form of labor, which can be as much of an an exhausting, uncompensated, and invisible burden as it is the exercise of power.\n\n5.3 Scalar debt and the consequences of \u2018catastrophic success\u2019\n\nThe concept of scalar labor leads to a related issue: many projects that become widely relied upon accumulate what we call \u201cscalar debt,\u201d a term we introduce based on the concept of \u201ctechnical debt\u201d [33]. Technical debt refers to engineering decisions that initially help a project advance or expand quickly, but at a cost that must be \u2018paid back\u2019 later with even more work than was initially saved. Projects that grow rapidly and achieve what one interviewee called \u201ccatastrophic success\u201d struggle in that they have not done enough scalar labor; that is, their growth in users has not led to a growth in the project\u2019s capacity to maintain itself at this new scale. Paying down this scalar debt then comes at an immense cost to the maintainers, which in contemporary F/OSS parlance, is often referred to as finding a working \u201csustainability model\u201d \u2014 a way to recruit new volunteers or raise\nfunds to hire contributors and maintainers to do work that is currently backlogged. This focus on the present is different from how \u201csustainability\u201d is discussed in scientific cyberinfrastructure, where it usually refers to questions about whether the infrastructure will persist in the long-term, often future decades [105, 106].\n\nScalar debt also is related to how F/OSS projects tend to develop management and governance structures on an ad-hoc basis over time, rather than preemptively plan these out before they are needed. This ad-hoc or \u201cspontaneous\u201d [35] governance model is common in F/OSS, as well as other peer production platforms [17, 124]. Researchers and practitioners in F/OSS have identified this a key strength [21, 130] \u2014 with some comparisons to the now-dominant Agile software development methodology [52, 129]. This ad-hocness in F/OSS has also been described as a product of shared ideological and cultural commitments, whose members may object to formalization and instead value autonomy and more distributed governance models [27, 35].\n\nHowever, this ad-hocness is also related to the resources and labor available to projects in the mostly-voluntary peer-production model that the F/OSS projects we studied began with. In our interviews, maintainers told us their concerns around how such formalization and bureaucratization can take substantial time, energy, social capital, and risk, meaning that there are very good reasons why a project may not want create such structures unless it is absolutely apparently necessary. However, one key case of scalar debt we encountered in multiple interviews was around Codes of Conduct and other moderation mechanisms, which have become particularly central to discussions of diversity and inclusion in F/OSS [38].\n\nIt can be difficult to know when a project is taking on scalar debt, although once this becomes apparent, maintainers described how they were living in a constant state of incipient crisis, overwork, or burnout just to keep the projects from falling too far backwards. Ironically, when projects are in this state of \u201cputting out fires,\u201d more work must be done to get and manage resources necessary to help put out those fires, whether this involves recruiting and mentoring new volunteer contributors or raising funds to hire employees. The success of these sustainability approaches are not guaranteed: volunteers can leave, patrons can withdraw support, grants can be rejected, and business models can fail to be profitable. For maintainers in the overwhelming \u2018putting out fires\u2019 state, it can be a difficult choice as to whether they spend their scarce time and energy on an unproven strategy to leverage more resources for the project, versus spending that time putting out the fires currently flaring up. As [122] and [104] also discuss in non-F/OSS contexts, projects have to continuously recalibrate what scale they are currently at, which also can be an uncertain and labor-intensive task.\n\n5.4 Scaling into becoming critical infrastructure for well-resourced organizations\n\nThis issue of scalar debt leads to a related issue that becomes apparent when a project scales into becoming relied upon as critical infrastructure by well-resourced organizations, from for-profit companies to universities to governments. As we previously reviewed, entire sectors of the global economy are now reliant on F/OSS projects. We use the term \u201clabor\u201d intentionally in this paper, which calls attention to how this work is part of the economy, contributing to the production and distribution of goods and services. In contrast, earlier work on F/OSS often emphasized the voluntary, altruistic, and alternative nature of this work, framing these projects more as communities. While the rising commercialization of F/OSS as a movement is a long and well-studied historical trend over the past two decades [10, 47, 77], we find that contemporary projects which begin as more voluntary, peer-production efforts can similarly transform as they scale. Early on, many smaller and more voluntary F/OSS projects seek to be relied upon, especially by large, well-known tech companies and universities, who can directly or indirectly help support the project. This can\nbring not only users (who may become contributors), but also prestige, connections, a pittance of donations \u2014 that is, cultural, social, and financial capital [13].\n\nYet several maintainers we interviewed described it as \u201cblessing and a curse\u201d to be relied upon by organizations that build their products on their projects. The companies benefited from their labor, but often did not offer resources or labor in return. We often heard how being relied upon in such a manner adds more work that is also more difficult; that software engineers at elite user organizations can be especially demanding and entitled. Maintainers must respond to these elite users in ways addressed in studies of emotional labor, that is, in managing the emotions of others [65]. Many such user organizations are \u201cfree riders\u201d who do not contribute back. Even when some corporations contribute back to F/OSS, they can do so in ways that place additional demands on maintainers, by demanding additional code review or asking F/OSS contributors to managing relationships with patrons. These are all ways in which becoming critical infrastructure for well-resourced organizations can increase and transform the work of maintainers, even as it brings users, resources, and prestige.\n\nAnother issue that arises is that becoming relied upon as infrastructure by others can make maintainers feel morally responsible for whatever products are built using their projects. Some described becoming disillusioned or burned-out specifically because they began their project imagining it would be used by those who could not afford commercial alternatives, but found it was more used by companies to lower their own costs. Some even expressed that they had to wrestle with how their projects were used as part of products the maintainers believed were unethical or harmful. These frustrations are compounded by maintainers\u2019 frustrations that the wealth derived by companies relying upon their technologies does not get shared with maintainers/contributors, evidencing an inequitable form of extraction. The activities around mentally working through such issues can also be seen as a form of both invisible work and scalar labor.\n\n5.5 Scaling and the dynamics of hypervisibility\n\nScale impacts maintainers\u2019 personal identities and relationships to broader publics. While much prior literature on maintenance and infrastructure in other sectors (e.g. power plants, transportation, commercial software) discusses maintenance as relatively less-visible and less-recognized work [39, 111, 117, 120], the centrality of maintainers as leaders of F/OSS projects leads to a different set of issues. Much of F/OSS work is done in public view because of the open nature of F/OSS work and especially the dominance of all-inclusive public code collaboration platforms like GitHub. As discussed in the sections on user support and proposed changes, maintainers can receive a deluge of requests from users and contributors, all of which are visible on the public web. Under public scrutiny, maintainers engage in the communicative labor of tracking management, emotional labor of user and contributor response, and, in sum, the production of the optics of a successful project.\n\nMaintainers of projects that have achieved massive success and scale \u2014 that are widely relied-upon and/or have a large contributor base \u2014 can achieve a kind of \u201cmicrocelebrity\u201d [89] status, a term originally from studies of social media. Eghbal compares F/OSS maintainers to content creators on Youtube or Instagram, particularly those who earn a quasi-independent living through patronage [42]. We found that some maintainers grow into microcelebrities, fueled by the dynamics of social media and technology standardization. Such maintainers have hundreds of thousands of followers on social media sites like Twitter, write widely-read blog posts on the state of F/OSS, and are flown to speak at major conferences and companies. Such maintainers can play a major role in conflict resolution and governance issues on public platforms such as mailing lists, particularly when the governance model is more ad-hoc. These influential leaders can become substitutes for the reorganization of project decision making and conflict resolution \u2013 a case of scalar debt.\nEvangelizing can reinforce this trend towards hypervisible microcelebrity maintainers, such as if an already famous maintainer is invited to give talks at F/OSS conferences to thousands of people, or flown out to give talks at companies and universities, which makes them even more famous. Funding, patronage and business models benefit from more famous figureheads, as well as often require that a single individual is the designated Principal Investigator on the grant or CEO of the business arm. Some microcelebrity maintainers told us they have to actively work against these dynamics, such as by sending others in their place when asked to speak at conferences.\n\nYet there still are forms of invisible work for the hypervisible. Such maintainers routinely receive torrents of unsolicited e-mails and private messages, from lavish praise to harassment. Much work also takes place outside of public code platforms, like writing grants or conflict resolution. Precisely because of their microcelebrity, these maintainers can be called in to adjudicate disputes behind the scenes. These findings suggest that maintenance labor is not always invisible; it can be hypervisible and highly valued. Given the dominant framing of maintenance and infrastructure as invisible work, [60, 116, 128] we urge future research into this intersection of issues.\n\n6 CONCLUSION\n\nThe focus of this paper has been on the intersection of labor and scale in the context of maintaining F/OSS projects, but our findings contribute to understanding challenges faced by people engaging in a variety of types of collaborative work to build common information resources while simultaneously developing organizations and governance structures. In our interviews, maintainers described being burned out by changes in what was expected of them fundamentally changed as projects scaled. These interviews were rich with insights into the deep and varied commitments of F/OSS maintainers, but also the emotional toll doing F/OSS work can take. Our findings have a wide import for discussions of governance, leadership, and sustainability, in socio-technical systems, including crowdsourcing, citizen science, scientific cyberinfrastructure, and crisis informatics. Particularly, our focus on labor - and people\u2019s reactions to changes in their labor - can help build awareness of how infrastructure sustainability is tied to the long-term well-being of maintainers as individuals and in their communities.\n\n6.1 Limitations\n\nAlthough we attempted to recruit a diverse group of participants for our interviews \u2014 with particular attention to the type/size of F/OSS project they worked on, employment, and geography \u2014 our findings are limited by the number of interviews we conducted and our recruitment methods. We have mostly studied projects that have been relied upon by others as infrastructure and began as volunteer projects, so our findings do not speak to overwhelming majority F/OSS projects that are developed and used by a single person but released publicly, as well as to entirely corporate-developed F/OSS projects. We have also sought to capture a kind of longitudinal view by focusing on maintainers, some of whom have long histories of involvement. A more traditional longitudinal study would capture these issues of scale with even more depth. Like in all interview-based studies, memories may be less accurate, so this study could be complemented with more detailed contemporaneous methods of capturing the work that maintainers do day-to-day, from participant-observation to diary studies to analyses of trace data.\n\nWe also acknowledge how we are implicated in the same kinds of systems of F/OSS sustainability as our participants. All authors have direct participant experience in F/OSS projects as contributors or maintainers, which gives us a sensitivity to these topics, but also means that we can lack some analytical distance that some strands of social science value. In particular, the fact that we were funded to study these issues by non-profit foundations that are also direct funders of F/OSS projects.\n\u2014 which was public knowledge that we disclosed prior to our interviews \u2014 may impact the kinds of responses we received.\n\n6.2 Recommendations and future work\n\nContributors and maintainers might better manage difficulties posed by scale if they regularly have conversations about what responsibilities entail, how much time and effort that work takes, and how the distribution of workloads and resources should change when the project changes. Maintainers may benefit from explicitly acknowledging when scalar debt is being taken on, as is sometimes commonly acknowledged when technical debt is being taken on. Focusing on these questions of scalar labor brings to light how scale is not always a universally good thing \u2014 even though there are broad pressures on projects that equate scale with success, as [104] also discusses in science. The benefits of scaling and success many also not be equitably distributed, as we discussed around the less visible and more gendered labor of event organizing, versus the dynamics that lead to microcelebrity maintainers. Finally, because efforts to build capacity and reduce the burdens of maintenance work itself can compound the amount of work to be done, funders and donors can be mindful of the opportunity costs that projects spend in soliciting resources. This can involve more lightweight funding mechanisms that require less up-front work on the part of maintainers and project leaders.\n\nMany areas in this paper might be expanded in future work. Specifically, we are interested in unpacking the effects of corporate reliance on F/OSS projects on maintainers\u2019 working and emotional lives. Although we brought in value misalignment as one way to interpret maintainers\u2019 reactions when corporations took but didn\u2019t give back to F/OSS, we believe more work can be done in this area to understand the political economy of value misalignment and the effects of corporate reliance on maintainers\u2019 mental health and well-being. This might involve conducting additional interviews that focus on projects\u2019 growth trajectories or focusing on projects that experienced the \u2018catastrophic success\u2019 gestured to in the discussion. Further exploring these areas might contribute valuable and actionable insights to improve F/OSS sustainability.\n\n7 ACKNOWLEDGMENTS\n\nThe authors would like to thank Alexandra Paxton, Nelle Varoquaux, Chris Holdgraf for their ongoing feedback, as well as Linwei Lu, Julio Gonzalez, and the CSCW reviewers for their insights. We are thankful to the cohort, advisors, and program managers of the Ford/Sloan Critical Digital Infrastructures Initiative in helping us plan this research. We appreciate the time our anonymous interviewees spent talking with us and reviewing various drafts of this work. We are thankful to Stacey Dorton for administrative support. This work has been financially supported by the Ford and Sloan Foundation through the Critical Digital Infrastructures Initiative (grant G-2018-11354), the National Science Foundation (grant DDRIG #1947213), as well as the Gordon & Betty Moore Foundation (grant GBMF3834) and Alfred P. Sloan Foundation (grant 2013-10-27) through the Moore-Sloan Data Science Environments grant to UC-Berkeley.\n\nREFERENCES\n\n[1] Morgan G. Ames, Daniela K. Rosner, and Ingrid Erickson. 2015. Worship, faith, and evangelism: Religion as an ideological lens for engineering worlds. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, New York, 69\u201381. https://doi.org/10.1145/2675133.2675282\n\n[2] Brian Anderson. 2017. How Rust is tested. https://brson.github.io/2017/07/10/how-rust-is-tested\n\n[3] Karen S Baker, David Ribes, Florence Millerand, and Geoffrey Bowker. 2005. Interoperability strategies for scientific cyberinfrastructure: Research and practice. In Proceedings of the American Society for Information Science and Technology (2005). https://doi.org/10.1002/meet.14504201237\n[4] Flore Barcellini, Fran\u00e7oise D\u00e9tienne, and Jean-Marie Burkhardt. 2014. A situated approach of roles and participation in open source software communities. *Human\u2013Computer Interaction* 29, 3 (2014), 205\u2013255. https://doi.org/10.1080/07370024.2013.812409\n\n[5] Ann Barcomb. 2016. Episodic volunteering in open source communities. In *Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering*. 1\u20133. https://doi.org/10.1145/2915970.2915972\n\n[6] Yochai Benkler. 2007. *The Wealth of Networks: How Social Production Transforms Markets and Freedom*. Yale University Press.\n\n[7] Richard Bentley, John A. Hughes, David Randall, Tom Rodden, Peter Sawyer, Dan Shapiro, and Ian Sommerville. 1992. Ethnographically-informed systems design for air traffic control. In *Proceedings of the 1992 ACM Conference on Computer-Supported Cooperative Work*. 123\u2013129. https://doi.org/10.1145/143457.143470\n\n[8] Magnus Bergquist and Jan Ljungberg. 2001. The power of gifts: Organizing social relationships in open source communities. *Information Systems Journal* 11, 4 (2001), 305\u2013320. https://doi.org/10.1046/j.1365-2575.2001.00111.x\n\n[9] Matthew J Bietz, Eric PS Baumer, and Charlotte P Lee. 2010. Synergizing in cyberinfrastructure development. *Computer Supported Cooperative Work (CSCW)* 19, 3-4 (2010), 245\u2013281. https://doi.org/10.1007/s10606-010-9114-y\n\n[10] Benjamin J. Birkinbine. 2015. Conflict in the Commons: Towards a Political Economy of Corporate Involvement in Free and Open Source Software. *The Political Economy of Communication* 2, 2 (2015). http://www.polecom.org/index.php/polecom/article/view/35 Number: 2.\n\n[11] Herbert Blumer. 1954. What is wrong with social theory? *American Sociological Review* 19, 1 (1954), 3\u201310.\n\n[12] Herbert Blumer. 1969. *Symbolic Interactionism: Perspective and Method*. University of California Press, Berkeley.\n\n[13] Pierre Bourdieu. 1973. Cultural reproduction and social reproduction. In *Knowledge, Education, and Cultural Change*, Richard Brown (Ed.). London: Tavistock.\n\n[14] Geoffrey C. Bowker and Susan Leigh Star. 2000. *Sorting Things Out: Classification and Its Consequences*. MIT Press.\n\n[15] Daren C Brabham. 2013. *Crowdsourcing*. The MIT Press, Cambridge, MA.\n\n[16] Dale A Bradley. 2006. The divergent anarcho-utopian discourses of the open source software movement. *Canadian Journal of Communication* 30, 4 (2006).\n\n[17] A. Bruckman and A. Forte. 2008. Scaling consensus: Increasing decentralization in Wikipedia governance. In *Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS)* (2008). 157.\n\n[18] Julia Bullard. 2016. Motivating invisible contributions: Framing volunteer classification design in a fanfiction repository. In *Proceedings of the 19th International Conference on Supporting Group Work* (Sanibel Island, Florida, USA, 2016-11-13) (GROUP \u201916). ACM, 181\u2013193. https://doi.org/10.1145/2957276.2957295\n\n[19] Brett Cannon. 2017. The give and take of open source. Talk at JupyterCon 2017. O\u2019Reilly Media. https://www.oreilly.com/radar/the-give-and-take-of-open-source/\n\n[20] Andrea Capiluppi and Martin Michlmayr. 2007. From the cathedral to the bazaar: An empirical study of the lifecycle of volunteer community projects. In *Open Source Development, Adoption and Innovation*. Springer US, 31\u201344. https://doi.org/10.1007/978-0-387-72486-7_3\n\n[21] Eugenio Capra, Chiara Francalanci, and Francesco Merlo. 2008. An empirical study on the relationship between software design quality, development effort and governance in open source projects. *IEEE Transactions on Software Engineering* 34, 6 (2008), 765\u2013782.\n\n[22] Adele E Clarke. 2003. Situational analyses: Grounded theory mapping after the postmodern turn. *Symbolic Interaction* 26, 4 (2003), 553\u2013576.\n\n[23] Adele E Clarke and Susan Leigh Star. 2008. The social worlds framework: A theory/methods package. In *The Handbook of Science and Technology Studies*. MIT Press, Cambridge, MA, 113\u2013137.\n\n[24] Gabriella Coleman. 2004. The political agnosticism of free and open source software and the inadvertent politics of contrast. *Anthropological Quarterly* 77, 3 (2004), 507\u2013519.\n\n[25] Gabriella Coleman. 2009. Code is speech: Legal tinkering, expertise, and protest among free and open source software developers. *Cultural Anthropology* 24, 3 (2009), 420\u2013454.\n\n[26] Gabriella Coleman. 2010. The hacker conference: A ritual condensation and celebration of a lifeworld. *Anthropological Quarterly* (2010), 47\u201372.\n\n[27] Gabriella Coleman. 2012. *Coding Freedom: The Ethics and Aesthetics of Hacking*. Princeton University Press, Princeton.\n\n[28] The Kernel Development Community. 2018. How the development process works. The Linux Kernel documentation. https://www.kernel.org/doc/html/v4.15/process/2.Process.html\n\n[29] Kevin Crowston. 2011. Lessons from volunteering and free/libre open source software development for the future of work. In *Researching the Future in Information Systems* (Berlin, Heidelberg, 2011) (IFIP Advances in Information and Communication Technology), Mike Chiasson, Ola Henfridsson, Helena Karsten, and Janice I. DeGross (Eds.). Springer, 215\u2013229. https://doi.org/10.1007/978-3-642-21364-9_14\n\n[30] Kevin Crowston, Robert Heckman, Hala Annabi, and Chengetai Masango. 2005. A structurational perspective on leadership in Free/Libre Open Source Software teams. *Proceedings of the First International Conference on Open Source\n[31] Kevin Crowston, Qing Li, Kangning Wei, U. Yeliz Eseryel, and James Howison. 2007. Self-organization of teams for free/libre open source software development. *Information and Software Technology* 49, 6 (2007), 564\u2013575. https://doi.org/10.1016/j.infsof.2007.02.004\n\n[32] Kevin Crowston, Kangning Wei, James Howison, and Andrea Wiggins. 2012. Free/Libre Open-Source Software Development: What We Know and What We Do Not Know. *ACM Computing Surveys (CSUR)* 44, 2 (March 2012), 35. https://doi.org/10.1145/2089125.2089127\n\n[33] Ward Cunningham. 1992. The WyCash portfolio management system. In *Proceedings of the Object-oriented Programming Systems, Languages, and Applications (Addendum)* (Vancouver, British Columbia, Canada, 1992-12-01) (OOPSLA \u201992). Association for Computing Machinery, 29\u201330. https://doi.org/10.1145/157709.157715\n\n[34] Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social coding in GitHub: transparency and collaboration in an open software repository. In *Proceedings of the ACM 2012 conference on computer supported cooperative work*. ACM, New York, 1277\u20131286.\n\n[35] Paul B. de Laat. 2007. Governance of open source software: state of the art. *Journal of Management & Governance* 11, 2 (2007), 165\u2013177. https://doi.org/10.1007/s10997-007-9022-9\n\n[36] Luiz Felipe Dias, Igor Steinmacher, Gustavo Pinto, Daniel Alencar da Costa, and Marco Gerosa. 2016. How does the shift to github impact project collaboration?. In *2016 IEEE International Conference on Software Maintenance and Evolution (ICSME)*. IEEE, 473\u2013477.\n\n[37] Fernando Dom\u00ednguez Rubio. 2020. *Ecologies of the Modern Imagination at the Art Museum*. University of Chicago Press, Chicago.\n\n[38] Christina Dunbar-Hester. 2019. *Hacking Diversity: The Politics of Inclusion in Open Technology Cultures*. Vol. 21. Princeton University Press.\n\n[39] David Edgerton. 2011. *Shock of the Old: Technology and Global History Since 1900*. Oxford University Press, Oxford.\n\n[40] Paul N Edwards, Steven J Jackson, Geoffrey C Bowker, and Cory P Knobel. 2007. Understanding infrastructure: Dynamics, tensions, and design. *Report of NSF Workshop on \u201cHistory & Theory of Infrastructure: Lessons for New Scientific Cyberinfrastructures\u201d* (2007). https://deepblue.lib.umich.edu/bitstream/handle/2027.42/49353/UnderstandingInfrastructure2007.pdf\n\n[41] Nadia Eghbal. 2016. *Roads and bridges: The unseen labor behind our digital infrastructure*. Ford Foundation.\n\n[42] Nadia Eghbal. 2020. *Working in Public: The Making and Maintenance of Open Source Software*. Stripe Press.\n\n[43] Hamid R Ekbia and Bonnie A Nardi. 2017. *Heteromation, and Other Stories of Computing and Capitalism*. MIT Press.\n\n[44] Nathan Ensmenger. 2008. Fixing things that can never be broken: Software maintenance as heterogeneous engineering. In *Proceedings of the SHOT Conference*.\n\n[45] Joseph Feller, Patrick Finnegan, Brian Fitzgerald, and Jeremy Hayes. 2008. From Peer Production to Productization: A Study of Socially Enabled Business Exchanges in Open Source Service Networks. *Information Systems Research* 19, 4 (2008), 475\u2013493. https://doi.org/10.1287/isre.1080.0207\n\n[46] Anna Filippova and Hichang Cho. 2015. Mudslinging and Manners: Unpacking Conflict in Free and Open Source Software. In *Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing* (CSCW \u201915). ACM, 1393\u20131403. https://doi.org/10.1145/2675133.2675254\n\n[47] Brian Fitzgerald. 2006. The Transformation of Open Source Software. *MIS Quarterly* 30, 3 (2006), 587\u2013598. https://doi.org/10.2307/25148740\n\n[48] Lee Fleming and David M Waguespack. 2007. Brokerage, boundary spanning, and leadership in open innovation communities. *Organization Science* 18, 2 (2007), 165\u2013180.\n\n[49] Karl Fogel. 2005. *Producing Open Source Software: How to Run a Successful Free Software Project*. O\u2019Reilly Media.\n\n[50] Linux Foundation. [n.d.]. The Bylaws of the Linux Foundation. https://www.linuxfoundation.org/en/bylaws/\n\n[51] Sarah E. Fox, Kiley Sobel, and Daniela K. Rosner. 2019. Managerial Visions: Stories of upgrading and maintaining the public restroom with IoT. In *Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems*. 1\u201315.\n\n[52] Erich Gamma. 2005. Agile, open source, distributed, and on-time: Inside the eclipse development process. In *International Conference on Software Engineering: Proceedings of the 27th International Conference on Software Engineering*, Vol. 15. 4\u20134.\n\n[53] Juan Mateos Garcia, W Edward Steinmueller, et al. 2003. *The open source way of working: A new paradigm for the division of labour in software development?* SPRU.\n\n[54] R. Stuart Geiger. 2011. The Lives of Bots. In *Wikipedia: A Critical Point of View*, G. Lovink and N. Tkacz (Eds.). Institute of Network Cultures, 78\u201393. http://www.stuartgeiger.com/lives-of-bots-wikipedia-cpov.pdf\n\n[55] R. Stuart Geiger and David Ribes. 2011. Trace ethnography: Following coordination through documentary practices. In *2011 44th Hawaii International Conference on System Sciences*. IEEE, 1\u201310.\n\n[56] Matt Germonprez, Julie E. Kendall, Kenneth E. Kendall, Lars Mathiassen, Brett Young, and Brian Warner. 2016. A Theory of Responsive Design: A Field Study of Corporate Engagement with Open Source Communities. *Information\n[57] Matt Germonprez, Georg J.P. Link, Kevin Lumbard, and Sean Goggins. 2018. Eight Observations and 24 Research Questions About Open Source Projects: Illuminating New Realities. *Proc. ACM Hum.-Comput. Interact.* 2, CSCW, Article 57 (2018), 22 pages. https://doi.org/10.1145/3274326\n\n[58] Elihu M Gerson. 2008. Reach, bracket, and the limits of rationalized coordination: Some challenges for CSCW. In *Resources, Co-Evolution and Artifacts*. Springer, 193\u2013220.\n\n[59] Paola Giuri, Francesco Rullani, and Salvatore Torrisi. 2008. Explaining leadership in virtual teams: The case of open source software. *Information Economics and Policy* 20, 4 (2008), 305\u2013315.\n\n[60] Stephen Graham and Nigel Thrift. 2007. Out of order: Understanding repair and maintenance. *Theory, Culture & Society* 24, 3 (2007), 1\u201325.\n\n[61] Kaj Gr\u00f8nb\u00e6k, Morten Kyng, and Preben Mogensen. 1992. CSCW challenges in large-scale technical projects\u2014a case study. In *Proceedings of the 1992 ACM Conference on Computer-supported Cooperative Work*. 338\u2013345.\n\n[62] Scott Hanselman. 2015. *Bring Kindness back to Open Source*. https://www.hanselman.com/blog/bring-kindness-back-to-open-source\n\n[63] Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016. Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects. In *Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016)*. Association for Computing Machinery, New York, NY, USA, 426\u2013437. https://doi.org/10.1145/2970276.2970358\n\n[64] Eric von Hippel. 2001. Innovation by User Communities: Learning from Open-Source Software. *MIT Sloan Management Review* 42, 4 (2001), 82\u201382. https://go.gale.com/ps/i.do?p=AONE&sw=w&issn=15329194&v=2.1&it=r&id=GALE%7CA77578225&sid=googleScholar&linkaccess=abs Sloan Management Review.\n\n[65] Arlie Russell Hochschild. 1983. *The Managed Heart: Commercialization Of Human Feeling*. University of California Press, Oakland, California.\n\n[66] Lara Houston, Steven J. Jackson, Daniela K. Rosner, Syed Ishtiaque Ahmed, Meg Young, and Laewoo Kang. 2016. Values in Repair. In *Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems* (San Jose, California, USA, 2016-05-07) (CHI \u201916). ACM, 1403\u20131414. https://doi.org/10.1145/2858036.2858470\n\n[67] Dorothy Howard and R. Stuart Geiger. 2019. Ethnography, Genealogy, and Political Economy in the Post-Market Era of Free & Open-Source Software. In *Proceedings of CSCW \u201919 Extended Abstracts*.\n\n[68] Dorothy Howard and Lilly Irani. 2019. Ways of Knowing When Research Subjects Care. In *Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems*. 1\u201316.\n\n[69] James Howison. 2015. Sustaining scientific infrastructures: transitioning from grants to peer production. In *iConference 2015* (2015-03-15). https://www.ideals.illinois.edu/handle/2142/73439 Accepted: 2015-03-23T21:58:14Z Publisher: iSchools.\n\n[70] Lilly C Irani and M Six Silberman. 2013. Turkopticon: Interrupting worker invisibility in amazon mechanical turk. In *Proceedings of the SIGCHI Conference on Human Factors in Computing Systems*. 611\u2013620.\n\n[71] Lilly C. Irani and M. Six Silberman. 2016. Stories we tell about labor: Turkopticon and the trouble with \"design\". In *Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems* (San Jose, California, USA, 2016-05-07) (CHI \u201916). ACM, 4573\u20134586. https://doi.org/10.1145/2858036.2858592\n\n[72] Steven J. Jackson, Syed Ishtiaque Ahmed, and Md. Rashidujjaman Rifat. 2014. Learning, innovation, and sustainability among mobile phone repairers in Dhaka, Bangladesh. In *Proceedings of the 2014 conference on Designing interactive systems* (Vancouver, BC, Canada, 2014-06-21) (DIS \u201914). Association for Computing Machinery, 905\u2013914. https://doi.org/10.1145/2598510.2598576\n\n[73] Steven J Jackson, Alex Pompe, and Gabriel Krieshok. 2012. Repair worlds: maintenance, repair, and ICT for development in rural Namibia. In *Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work*. 107\u2013116.\n\n[74] C. Jensen and W. Scacchi. 2005. Collaboration, Leadership, Control, and Conflict Negotiation and the Netbeans.org Open Source Software Development Community. In *Proceedings of the 38th Annual Hawaii International Conference on System Sciences* (2005-01). 196b\u2013196b. https://doi.org/10.1109/HICSS.2005.147 ISSN: 1530-1605.\n\n[75] Helena Karasti, Karen S. Baker, and Florence Millerand. 2010. Infrastructure time: Long-term matters in collaborative development. *Computer Supported Cooperative Work (CSCW)* 19, 3 (2010), 377\u2013415. https://doi.org/10.1007/s10606-010-9113-z\n\n[76] Christopher M Kelty. 2008. *Two Bits: The Cultural Significance of Free Software*. Duke University Press.\n\n[77] Christopher M Kelty. 2013. There is no free software. *The Journal of Peer Production* (2013). Issue 3. http://peerproduction.net/issues/issue-3-free-software-epistemics/debate/there-is-no-free-software/\n\n[78] Mathias Klang. 2005. Free software and open source: The freedom debate and its consequences. *First Monday* 10, 3 (2005).\n\n[79] Nolan Lawson. 2017. What it feels like to be an open-source maintainer. Read the Tea Leaves. https://nolanlawson.com/2017/03/05/what-it-feels-like-to-be-an-open-source-maintainer/\n[80] Charlotte P. Lee, Paul Dourish, and Gloria Mark. 2006. The human infrastructure of cyberinfrastructure. In Proceedings of the 2006 20th Anniversary Conference on Computer Supported Cooperative Work (Banff, Alberta, Canada) (CSCW \u201906). ACM, New York, NY, USA, 483\u2013492. https://doi.org/10.1145/1180875.1180950\n\n[81] Charlotte P Lee and Drew Paine. 2015. From The matrix to a model of coordinated action (MoCA) A conceptual framework of and for CSCW. In Proceedings of the 18th ACM Conference on Computer-supported Cooperative Work & Social Computing. 179\u2013194.\n\n[82] Yan Li, Chuan-Hoo Tan, and Hock-Hai Teo. 2012. Leadership characteristics and developers\u2019 motivation in open source software development. Information & Management 49, 5 (2012), 257\u2013267.\n\n[83] Yu-Wei Lin, Jo Bates, and Paula Goodale. 2016. Co-observing the weather, co-predicting the climate: Human factors in building infrastructures for crowdsourced data. Science and Technology Studies 29, 3 (2016), 10\u201327. http://dspace.stir.ac.uk/handle/1893/26101 Accepted: 2017-11-28T23:28:19Z Publisher: Finnish Society for STS.\n\n[84] Arwid Lund. 2017. Wikipedia, Work and Capitalism. Springer, London.\n\n[85] Jennifer Helene Maher. 2015. Software Evangelism and the Rhetoric of Morality: Coding Justice in a Digital Democracy. Routledge, London.\n\n[86] George E. Marcus. 1995. Ethnography in/of the World System: The Emergence of Multi-Sited Ethnography. Annual Review of Anthropology 24, 1 (1995), 95\u2013117. https://doi.org/10.1146/annurev.an.24.100195.000523\n\n[87] M Lynne Markus. 2007. The governance of free/open source software projects: monolithic, multidimensional, or configurational? Journal of Management & Governance 11, 2 (2007), 151\u2013163.\n\n[88] Steve Marquess. 2014. Of Money, Responsibility, and Pride. http://veridicalsystems.com/blog/of-money-responsibility-and-pride/ Library Catalog: veridicalsystems.com.\n\n[89] Alice Marwick and Danah Boyd. 2011. To see and be seen: Celebrity practice on Twitter. Convergence 17, 2 (2011), 139\u2013158.\n\n[90] Ashwin Mathew and Coye Cheshire. 2017. Risky Business: Social Trust and Community in the Practice of Cybersecurity for Internet Infrastructure. IEEE. https://doi.org/10.24251/HICSS.2017.283\n\n[91] Ashwin J. Mathew. 2016. The myth of the decentralised internet. 5, 3 (2016). https://policyreview.info/articles/analysis/myth-decentralised-internet\n\n[92] Amanda Menking and Ingrid Erickson. 2015. The heart work of Wikipedia: Gendered, emotional labor in the world\u2019s largest online encyclopedia. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, ACM. 207\u2013210.\n\n[93] Robert K Merton. 1968. The Matthew effect in science: The reward and communication systems of science are considered. Science 159, 3810 (1968), 56\u201363.\n\n[94] Audris Mockus, Roy T. Fielding, and James Herbsleb. 2000. A case study of open source software development: the Apache server. In Proceedings of the 22nd International Conference on Software Engineering (Limerick, Ireland, 2000-06-01) (ICSE \u201900). Association for Computing Machinery, 263\u2013272. https://doi.org/10.1145/337180.337209\n\n[95] Lauren Morse, Janice M & Clark. 2019. The nuances of grounded theory sampling and the pivotal role of theoretical sampling. The SAGE Handbook of Current Developments in Grounded Theory (2019), 145\u2013166.\n\n[96] Chandra Mukerji. 1989. A Fragile Power: Scientists and the State. Princeton University Press.\n\n[97] Joel Novek. 2002. IT, gender, and professional practice: Or, why an automated drug distribution system was sent back to the manufacturer. Science, Technology, & Human Values 27, 3 (2002), 379\u2013403. https://doi.org/10.1177/016224390202700303 SAGE Publications.\n\n[98] Wanda Orlikowski and Susan Scott. 2008. Sociomateriality: Challenging the separation of technology, work and organization. The Academy of Management Annals 2, 1 (2008), 433\u2013474.\n\n[99] Julian E Orr. 2016. Talking About Machines: An Ethnography of a Modern Job. Cornell University Press, Ithaca.\n\n[100] Mathieu O\u2019Neil, Laure Muselli, Mahin Raissi, and Stefano Zacchiroli. 2020. \u2018Open source has won and lost the war\u2019: Legitimising commercial\u2013communal hybridisation in a FOSS project. New Media & Society (2020), 1461444820907022.\n\n[101] Elena Parmiggiani. 2017. This Is Not a Fish: On the Scale and Politics of Infrastructure Design Studies. Computer Supported Cooperative Work (CSCW) 26, 1 (2017), 205\u2013243. https://doi.org/10.1007/s10606-017-9266-0\n\n[102] Eric Raymond. 1999. The cathedral and the bazaar. In Readings in Cyberethics, Richard A. Spinello and Herman T. Tavani (Eds.). O\u2019Reilly Press.\n\n[103] RedHat. 2020. The State of Enterprise Open Source. https://www.redhat.com/cms/managed-files/rh-enterprise-open-source-report-detail-f21756-202002-en.pdf\n\n[104] David Ribes. 2014. Ethnography of scaling, or, how to a fit a national research infrastructure in the room. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing. 158\u2013170.\n\n[105] David Ribes and Thomas A Finholt. 2007. Tensions across the scales: planning infrastructure for the long-term. In Proceedings of the 2007 International ACM Conference on Supporting Group Work. 229\u2013238.\n\n[106] David Ribes and Thomas A Finholt. 2009. The long now of infrastructure: Articulating tensions in development. Journal of the Association for Information Systems (JAIS) (2009).\n[107] David Ribes, Steven Jackson, R. Stuart Geiger, Matthew Burton, and Thomas Finholt. 2013. Artifacts that organize: Delegation in the distributed organization. *Information and Organization* 23, 1 (2013), 1\u201314.\n\n[108] David Ribes and Charlotte P Lee. 2010. Sociotechnical studies of cyberinfrastructure and e-research: Current themes and future trajectories. *Computer Supported Cooperative Work (CSCW)* 19, 3-4 (2010), 231\u2013244.\n\n[109] Dirk Riehle, Philipp Riemer, Carsten Kolassa, and Michael Schmidt. 2014. Paid vs. Volunteer Work in Open Source. In *Proceedings of the 47th Hawaii International Conference on System Sciences*. 3286\u20133295. https://doi.org/10.1109/HICSS.2014.407\n\n[110] Daniela K. Rosner. 2014. Making Citizens, Reassembling Devices: On Gender and the Development of Contemporary Public Sites of Repair in Northern California. *Public Culture* 26, 1 (2014), 51\u201377. https://doi.org/10.1215/08992363-2346250\n\n[111] Andrew L. Russell and Lee Vinsel. 2018. After Innovation, Turn to Maintenance. *Technology and Culture* 59, 1 (2018), 1\u201325. https://doi.org/10.1353/tech.2018.0004 Publisher: Johns Hopkins University Press.\n\n[112] Bert M Sadowski, Gaby Sadowski-Rasters, and Geert Duysters. 2008. Transition of governance in a mature open software source community: Evidence from the Debian case. *Information Economics and Policy* 20, 4 (2008), 323\u2013332.\n\n[113] Salvatore Sanfilippo. 2019. *The struggles of an open source maintainer*. http://antirez.com/news/129\n\n[114] Trebor Scholz. 2008. Market ideology and the myths of Web 2.0. *First Monday* 13, 3 (2008).\n\n[115] Clay Shirky. 2010. *Cognitive Surplus: Creativity and Generosity in a Connected Age*. Penguin UK.\n\n[116] Susan Leigh Star. 1999. The ethnography of infrastructure. *American behavioral scientist* 43, 3 (1999), 377\u2013391.\n\n[117] Susan Leigh Star and Anselm Strauss. 1999. Layers of silence, arenas of voice: The ecology of visible and invisible work. *Computer Supported Cooperative Work (CSCW)* 8, 1-2 (1999), 9\u201330.\n\n[118] Anselm Strauss. 1988. The articulation of project work: An organizational process. *Sociological Quarterly* 29, 2 (1988), 163\u2013178.\n\n[119] Anselm Strauss and Juliet Corbin. 1994. Grounded theory methodology. *Handbook of Qualitative Research* 17 (1994), 273\u201385.\n\n[120] Lucy Suchman. 1995. Making work visible. *Commun. ACM* 38, 9 (1995), 56\u201364.\n\n[121] Lucy Suchman. 2007. *Human-machine Reconfigurations: Plans and Situated Actions*. Cambridge University Press.\n\n[122] E. Carr Summerson and Michael Lempert. 2016. *Scale: Discourse and Dimensions of Social Life*. University of California Press.\n\n[123] Don Tapscott and Anthony D Williams. 2008. *Wikinomics: How Mass Collaboration Changes Everything*. Penguin.\n\n[124] Nathaniel Tkacz. 2014. *Wikipedia and the Politics of Openness*. University of Chicago Press.\n\n[125] Linus Torvalds and David Diamond. 2002. *Just for Fun: The Story of an Accidental Revolutionary*. Harper Business.\n\n[126] Jason Tsay, Laura Dabbish, and James Herbsleb. 2014. Let\u2019s Talk about It: Evaluating Contributions through Discussion in GitHub. In *Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering* (Hong Kong, China) (FSE 2014). ACM, New York, NY, USA, 144\u2013154. https://doi.org/10.1145/2635868.2635882\n\n[127] Jos\u00e9 Van Dijck and David Nieborg. 2009. Wikinomics and its discontents: A critical analysis of Web 2.0 business manifestos. *New Media & Society* 11, 5 (2009), 855\u2013874.\n\n[128] Kazys Varnelis. 2008. *Invisible City: Telecommunication*. Actar Barcelona, New York.\n\n[129] Juhani Warsta and Pekka Abrahamsson. 2003. Is open source software development essentially an agile method. In *Proceedings of the 3rd Workshop on Open Source Software Engineering*. 143\u2013147.\n\n[130] Steve Weber. 2004. *The Success of Open Source*. Harvard University Press.\n\n[131] Kangning Wei, Kevin Crowston, U. Yeliz Eseryel, and Robert Heckman. 2017. Roles and politeness behavior in community-based free/libre open source software development. *Information & Management* 54, 5 (2017), 573\u2013582. https://doi.org/10.1016/j.im.2016.11.006\n\n[132] Andrea Wiggins. 2013. Free as in puppies: compensating for ICT constraints in citizen science. In *Proceedings of the 2013 Conference on Computer Supported Cooperative Work*. 1469\u20131480.\n\n[133] Susan Winter, Nicholas Berente, James Howison, and Brian Butler. 2014. Beyond the organizational \u2018container\u2019: Conceptualizing 21st century sociotechnical work. *Information and Organization* 24, 4 (2014), 250\u2013269.\n\n[134] Alexey Zagalsky, Carlos G\u00f3mez Teshima, Daniel M German, Margaret-Anne Storey, and Germ\u00e1n Poo-Caama\u00f1o. 2016. How the R community creates and curates knowledge: A comparative study of stack overflow and mailing lists. In *Proceedings of the 13th International Conference on Mining Software Repositories*. 441\u2013451.\n\nReceived June 2020; revised October 2020; accepted December 2020", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/008-geiger.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 28, "total-input-tokens": 72543, "total-output-tokens": 29265, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 3960, 1], [3960, 8511, 2], [8511, 12539, 3], [12539, 16654, 4], [16654, 20785, 5], [20785, 24835, 6], [24835, 28924, 7], [28924, 33358, 8], [33358, 37501, 9], [37501, 41627, 10], [41627, 45949, 11], [45949, 50156, 12], [50156, 54374, 13], [54374, 58675, 14], [58675, 62796, 15], [62796, 66769, 16], [66769, 71813, 17], [71813, 76009, 18], [76009, 80391, 19], [80391, 84625, 20], [84625, 88913, 21], [88913, 93013, 22], [93013, 96952, 23], [96952, 102394, 24], [102394, 107682, 25], [107682, 112949, 26], [112949, 118304, 27], [118304, 123202, 28]]}}
|
|
{"id": "bf19e5341e57a6c970f0b7c548e8062375bf3750", "text": "\u201cNip it in the Bud\u201d: Moderation Strategies in Open Source Software Projects and the Role of Bots\n\nJANE HSIEH, Carnegie Mellon University, USA\nJOSELYN KIM, Carnegie Mellon University, USA\nLAURA DABBISH, Carnegie Mellon University, USA\nHAIYI ZHU, Carnegie Mellon University, USA\n\nMuch of our modern digital infrastructure relies critically upon open sourced software. The communities responsible for building this cyberinfrastructure require maintenance and moderation, which is often supported by volunteer efforts. Moderation, as a non-technical form of labor, is a necessary but often overlooked task that maintainers undertake to sustain the community around an OSS project. This study examines the various structures and norms that support community moderation, describes the strategies moderators use to mitigate conflicts, and assesses how bots can play a role in assisting these processes. We interviewed 14 practitioners to uncover existing moderation practices and ways that automation can provide assistance. Our main contributions include a characterization of moderated content in OSS projects, moderation techniques, as well as perceptions of and recommendations for improving the automation of moderation tasks. We hope that these findings will inform the implementation of more effective moderation practices in open source communities.\n\nCCS Concepts: \u2022 Human-centered computing \u2192 Open source software; Empirical studies in HCI; Empirical studies in collaborative and social computing.\n\nAdditional Key Words and Phrases: moderation, automation, coordination, open source\n\nACM Reference Format:\nJane Hsieh, Joselyn Kim, Laura Dabbish, and Haiyi Zhu. 2023. \"Nip it in the Bud\": Moderation Strategies in Open Source Software Projects and the Role of Bots. Proc. ACM Hum.-Comput. Interact. 7, CSCW2, Article 301 (October 2023), 29 pages. https://doi.org/10.1145/3610092\n\n1 INTRODUCTION\n\nOnline social coding platforms such as GitHub facilitate the production of open source software (OSS), which modern digital infrastructure relies heavily upon. However, excess volumes of issues and requests filed by users can overload volunteer project maintainers [86]. Aggravating the situation, open source developers can become toxic and hostile in the course of technical or ideological disagreements [26]. Incivility suppresses productivity, creativity and quality in the workplace [84], and for semi-professional software production platforms like GitHub, such misbehaviors have caused growing concerns over the mental well-being of contributors and maintainers [73].\n\nModeration, as a non-technical form of labor, is a necessary but often overlooked and understudied task that maintainers undertake to sustain the community around an OSS project. To date, it is not well understood how maintainers grapple with toxic or undesirable behavior on their projects, particularly at scale. Research has described the different types of conversations around code...\ncontributions [104] and categorized toxic content as insults, trolling, as well as displays of arrogance and entitlement [26, 75]. At the same time we know that responding to issues and pull requests is an important part of maintenance work in open source [34, 43]. As Geiger points out, maintainers must delicately navigate instances where there is a mismatch between the work required to merge a contribution and non-maintainers\u2019 desires to integrate a certain piece of functionality [43]. We also know that responses and interactions around public conversations on OSS projects in GitHub are an important signal to potential contributors and users of a project, underscoring the importance of dealing with toxicity [30, 85].\n\nA growing body of research in CSCW examines how users moderate their own content in online communities and increasingly leverage automation to more efficiently control bad behavior [66, 93, 106]. These studies describe the challenges of moderation in different platforms (e.g. [56, 58]), explores novel moderation techniques and tools (e.g. [21, 22]) and examine the effectiveness of different moderation behaviors and strategies (e.g. [92, 93]). For example, Jhaver et al. find that moderation transparency matters - offering removal explanations on Reddit reduces the likelihood of future post removals [55]. In [66], Lampe and Resnick observe that timeliness trades off with accuracy in distributed moderation systems like Slashdot. And while automated moderation systems scale well in removing obviously undesirable content (e.g. spam and malware links), Chancellor et al. note how they can magnify errors [20], making human decisions more preferable for nuanced [19, 54] and high stakes contexts [47].\n\nBut the online social media communities and platforms studied in much of the moderation research vary in three important ways from open source software development. First, social media forums and text-based discussion groups are typically informal public spaces where people gather to share compelling and interesting information, converse, as well as build communities [81], whereas open source communities aim to collaboratively produce software, which can entail complex organizational structures and highly technical discussions tied with code artifacts and software that is utilized for professional purposes [14]. Secondly, each individual contributor\u2019s activities on GitHub have implications for employment prospects and reputation, both within OSS and the professional community more broadly [1]. Unlike their peers on discussion groups like Reddit, where participants are pseudonymous or anonymous, a large portion of GitHub users are real name identified, and often their accounts are listed on personal CVs or resumes [29, 95]. Finally, the types of inappropriate behaviors and harmful content that present in OSS communities diverge from what is traditionally found in social media. Past work has uncovered that passive-aggressive behaviors such as name-calling and entitlement are more prevalent among conversations between OSS developers [36, 75], and our findings support these results. Thus, the distinctive user goals, behaviors and inappropriate content found in OSS communities might necessitate the adoption of unconventional moderation strategies.\n\nIn this study, we qualitatively examine community moderation in open source repositories, that is existing strategies, structures and techniques used for mitigating and preventing inappropriate activity and conversation. Moderation here includes activities to manage behavior in conversations around issues and code contributions as well as the code itself (e.g. the use of potentially offensive variable names). Specifically, we sought to answer the following research questions to investigate moderation in open source communities:\n\n**Research questions:**\n\n1. What does moderation look like in OSS?\n a. Who performs moderation actions in projects and in what capacity\n b. What strategies do moderators use to respond to, diffuse and prevent conflicts?\n\n1 choices that may lead to disinhibition online [67, 99, 111]\n(2) What are the current limitations of automation for moderation and potential future improvements?\n\nIn order to address these questions, we conducted interviews with 14 maintainers across 10 projects to identify how moderation actions are performed on projects of different scales as well as attitudes towards algorithmic support for toxicity moderation and prevention. We find that 1.) moderation in open source is conducted by different roles depending on the size and structure of projects and 2.) moderators leverage several strategies to mitigate and prevent emergent conflicts and 3.) that future efforts will need to address concerns around customizability and detection accuracy before deploying automation tools to help offload the labor of moderation. By documenting the structures and forms of labor performed around moderation within open source projects, we hope to enlighten future practitioners on the available strategies for moderating digitally-mediated software development contexts. By characterizing the potentials and limitations of automation tools for moderation, we support practitioners in understanding and anticipating the challenges and impacts of adopting such automation. We also encourage tool designers and developers to build on these findings, so that future tools for moderation can provide improved and wider services to open source community members.\n\n2 BACKGROUND\n\nSoftware development is a product-oriented and collaborative endeavor, making open source development environments semi-formal working spaces that expect professional conduct from their participants. During the development process, project collaborators may encounter a myriad of technical and interpersonal conflicts that impede their work. In the following we present our study platform, notable types of open source conflicts that prior work and our participants have reported, as well as relevant prior work on moderation and automation for toxicity detection in various online communities.\n\n2.1 Study Context: GitHub\n\nWe focused our data collection on open source software projects hosted on the GitHub platform. GitHub facilitates collaboration and communication among developers, users and owners of software projects [30, 71], and is arguably the most popular hosting site for projects. As of June 2022, GitHub reports having over 83 million developers [5] and more than 200 million repositories (including at least 28 million public repositories).\n\nProjects on GitHub are organized into code repositories (or repos for short) which can be owned by a personal account (usually the creator or another maintainer) or by an organization, which comprises multiple users. Collaborators to the repository have direct write access to make commits, and they work together with the owner to maintain the project. Users primarily consist of software consumers, and they can star repos to express interest in a project or to save it for later reference. Within each repository, contributors and users can plan work, track bugs, request new features or express maintenance concerns by creating issues [71]. When an external (non-collaborator) developer has changes to propose, they can submit a pull request \u2013 a special issue for posting code contributions so that others can review and integrate them into the existing codebase. However, a pull request requires approval by one or more authorized collaborators before it can be merged. To communicate about developments, collaborators can comment under issues, pull requests as well as lines of code.\n2.2 Conflicts and Incivility in Open Source\n\nOpen source project maintainers are responsible for tremendous amounts of unseen civic labor that underlies our digital infrastructure, and many have documented how overwhelming volumes of such invisible labor engenders harm to the mental well-being [33, 68]. Maintainers are seldom recognized sufficiently for their stewardship, causing individual stress and burnout [86], imperiling projects for undermaintenance, and threatening the overall sustainability of the open source ecosystem [26, 86]. Due to factors like a lack of corporate management structure and geographic dispersion, open source maintainers are required to undertake a plethora complex interpersonal and organizational work [43]. Community maintenance tasks include providing support to internal contributors as well as technical assistance to external users so they can make use of the product. Previous investigations have found such organizational and interpersonal labor to play a critical role in traditional software engineering contexts [74, 80, 101]. Due to the fully public and largely voluntary nature of discussions and actions in open source development, moderation is one necessary task that maintainers must undertake to avoid an overwhelming amount of negative content and harmful interactions.\n\nPrior work extensively documented the presence of incivility, conflict and in general negative emotions across multiple actions of open source development, including code reviews [10, 11, 16, 31, 32, 36, 87], issue discussions [37, 75] as well as in comments to these actions [41, 50]. Negative interactions occur among different members of the community (e.g. core collaborators, external contributors as well as maintainers across different projects), and stem from multiple grounds, ranging from language and cultural differences to political disagreements to personal feuds to software dependencies to mismatches in expectations [38, 43, 71]. Conflicts among internal contributors can be difficult to moderate - since organization members cannot ban each other and interventions between familiar and respected contributors can get tricky - but politically charged misconduct from external and banned members can be harmful as well. Incivil behaviors present in the semi-professional volunteer software development environment endanger the sustainability of open source by decreasing the intrinsic motivation of contributors, reducing their productivity, and heightening dropout rates of newcomers [63, 76, 84]. Rather than categorizing the types of conflict in open source (the focus of [26, 38, 75]), one aim of our study (via RQ1) is to characterize strategies and structures that maintainers use to moderate such incivil situations.\n\nWhile incivility originating from internal contributors of a project has been well-studied [10, 11, 16, 31, 32, 36, 87, 102], frustration that follows from unrealistic expectations of user support can cause toxic insults [75] and entitlement directed at maintainers, demanding their time and attention [43]. User support involves providing assistance to consumers of the software who have difficulties making use of it either because of existing defects or the consumer\u2019s misunderstanding of some aspect of the software [65]. Swarts identified usability and transparency issues as causes of user needs in open source [100]. As projects scale, user support becomes a tedious task, overwhelming maintainers with issues and requests, demanding their time and emotional labor [43]. Unlike commercial vendors that generally rely on institutional infrastructures such as paid and dedicated IT or tech support teams, open source software provides informational user support free of charge, via a small group of volunteer users and maintainers [65].\n\n2.3 Governance and Moderation\n\nThe non-technical labor of moderation is often overlooked but essential for understanding the infrastructure of open source [33, 69, 88]. According to Grimmelmann, moderation consisted of \u201cgovernance mechanisms that structure participation in a community to facilitate cooperation and prevent abuse\u201d [48]. For the context of open source, we define community moderation to be the set of activities that maintainers and designated moderators leverage to manage behavior in conversations.\naround issues, code contributions, and code itself in an effort to minimize harmful and abusive activities and to foster a collaborative and welcoming environment for contributors.\n\nMuch like their social media (e.g. Discord [58], Reddit [23, 62], Twitter[57]) and peer production contemporaries (e.g. Wikipedia [12, 40, 44]), GitHub communities engage in volunteer-based community moderation, as opposed to platform-wide commercial moderation. Such voluntary nature of moderation and maintenance in open source forces members of the community (e.g., maintainers or volunteer contributors) to bear the responsibility of providing support and assistance to users. But unlike support providers of commercial software products, the services of volunteer contributors are uncompensated [65]. To exacerbate their workload, prior studies reported that maintainers found user support to be an \u201coverwhelming and never-ending chore, particularly for projects that use GitHub-style collaboration platforms\u201d [43]. The staggering volume of demands for user support and feature requests on GitHub\u2019s issue-posting mechanisms demonstrates an instance of overuse \u2013 a form of deviant behavior among Grimmelmann\u2019s categorization of abuses that leads to congestion and cacophony, making it harder for information to get through and thereby hindering users\u2019 information search and retrieval processes [48].\n\nExisting systems of platformic content moderation have been found to vary in terms of actions, styles, philosophies and values. In a systematic review that engaged 86 papers related papers, Jiang et al. described such tradeoffs and compared the various moderation techniques with Grimmelann\u2019s four broad categories [59]. These included exclusion \u2013 the act of depriving people access to the online community, often through bans or timeouts, organizing \u2013 consisting of measures like removing and annotating content, norm-setting \u2013 a practice of issuing warnings or \u201cindirect policing\u201d to denounce bad behavior as well as monetary pricing \u2013 a way of using market forces to raise the prices of participation on users \u2013 though social media users were not found to engage with this last category [48, 59]. In a study of volunteer moderators on Reddit, Facebook and Twitch, Seering et. al. showed how moderators used excluding and norm-setting actions (e.g., bans and warnings) at increasingly restrictive rates and relied heavily on general community members to report and flag misbehaviors [94]. While the actions of excluding, organizing and norm-setting may be transferable to open source moderation, we expect that the distinct forms of inappropriate content might motivate the adoption of other unique strategies and governance structures. We sought to characterize the moderation structures, norms and roles involved in open source via RQ2.\n\nWhile some past work examined conflict management strategies for peer review [52] and the emergence of early governance structures on GitHub [82], we lack knowledge around the specific strategies that maintainers use to moderate inappropriate and problematic behaviors in open source. Among the many forms of intervention techniques available for such purposes, Renee et al. investigated how the code of conduct, a document that \u201cdefines standards for how to engage in a community... signals an inclusive environment that respects all contributions ...[and] outlines procedures for addressing problems between members\u201d [6] is used for moderation. Other moderation tools include documents such as contributing guidelines (\u201cwhich provides potential project contributors with a short guide to how they can help with your project\u201d [9]), moderation policies, or in-house features such as bans, the locking of conversations [8]. However, Geiger et al. uncovered that contributors are not as intrinsically motivated to engage in non-technical maintenance work (e.g. community support and documentation) as they are to complete more technical tasks (e.g. feature implementation or debugging) [45], indicating a need for more comprehensive and higher-level strategies for conducting moderation for complex situations and interpersonal conflicts. Maintainers can be especially discouraged to perform moderation work since it has been found to cause psychological and emotional distress [98], and automated assistance to moderation can be an\n\n2though GitHub did develop a set of platform-wide Acceptable Use Policies [7]\nappealing solution, with the potential to minimize maintainers\u2019 time and labor on tedious tasks and increasing developer productivity [35, 107]. However, there exists a gap in our understanding of how OSS moderation is executed in practice, both in terms of strategies as well as the roles and structures that are established to support and facilitate moderation. In this study, we qualitatively investigate such infrastructures and approaches, as well as uncover maintainers\u2019 perspectives on how automation can support moderation.\n\n2.4 Automated Moderation Bots in Open Source\n\nSentiment Bot and Safe Space are examples of tools that leverage existing sentiment analysis models to help maintainers detect and regulate the existence of toxic comments on GitHub. The Sentiment Bot is a GitHub App built with GitHub\u2019s Probot framework that \u201creplies to toxic comments with a maintainer designated reply and a link to the repo\u2019s code of conduct\u201d [4] while Safe Space is a GitHub action that leverages TensorFlow\u2019s toxicity classification model to \u201cdetect potential toxic comments added to PRs and issues so authors can have a chance to edit them and keep repos a safe space\u201d [3]. Both of these bots make use of machine learning classifiers to detect toxic content within pull request or issue threads and respond back with a comment that urges the original author to modify or delete their comment whenever problematic content is detected. Underlying such tools are sentiment analysis detectors, and numerous models have emerged in the field of software engineering to improve accuracy and domain specificity of such models. These include classifiers of negative interactions trained on conversations surrounding issues [41, 61, 78, 86, 89], code reviews [10, 16, 37], commits [50, 51], codes of conduct [97] as well as data from other contexts such as IT support [15] and Stack Overflow [18].\n\nHowever, bot use in open source contexts has its own associated challenges. Wessel et al. found that bot-generated noise (in the form of verbosity or excessive/undesirable tasks) causes annoyance for contributors, disrupts their workflow, and creates additional labor for maintainers [108]. Meanwhile, Huang et al. discovered how contributors react negatively to automated encouragements [52]. Outside of open source, Jhaver et al. described how subpar removal explanations provided by bots on Reddit brewed community resentment [55]. In voice-based communities like Discord, bots faced challenges in identifying rule violations based on nuances such as tone and accent, despite the widespread adoption of bots to automate features [58]. Jiang et. al. highlighted the tradeoff that while automation help communities achieve moderation at massive scales and with faster turnarounds, human involvement is required to understand contextual nuances, provide clear removal explanations, and conduct negotiations around norms that contribute toward community building [59]. Moderators of the three platforms that Seering et. al. studied also expressed the desire to personally deal with harder, more nuanced situations, despite being content to have automated tools deal with the most egregious and unwanted content \u2014 the authors argue that these desiderata are motivated by moderators\u2019 inclination to make context-specific judgments and impact community development [94]. Smith et al. identified community values related to the design and usage of machine learning-based predictive tools in content moderation on Wikipedia [96].\n\nIn open source, maintainers\u2019 and moderators\u2019 stances toward automation are likely to differ as open source contributors are more habituated to using tooling for increasing productivity and efficiency, whereas the efficiency of moderation has been found to trade off with quality [59, 66]. The second part of RQ2 aims to provide insights on how well current moderation bots support human maintainers in open source contexts and what improvements are needed to reduce friction and concerns in adoption.\n3 METHOD\n\nTo learn how maintainers and moderators maintain their communities, we interviewed 14 individuals who moderate or maintain projects of varied sizes ranging from 500 to 87,000 stars on their repos and 30 to 4,000 contributors. Before beginning the interview and recruitment process, we obtained institutional IRB approval and debriefed participants on the type of questions to expect prior to starting the interviews to ensure ethicality.\n\n3.1 Recruitment\n\nParticipants were recruited through publicly available information on GitHub. The requirement criteria for the participants were that they had to be 1.) at least the age of 18 years old and 2.) either a current or past maintainer or moderator for a collaborative open source project. We started recruiting participants by emailing owners of repos that used moderation bots. But we soon realized that most of these owners had limited moderation experience, since their bot setup resulted from a forked template project. We expanded to recruiting from repos with designated moderation teams or contributing guidelines using search terms such as \u201cmoderating\u201d or \u201cmoderation team\u201d on GitHub, and also conducted snowball sampling by asking participants to refer us to other potential interviewees. If the maintainer\u2019s contact information was public, we requested an interview via email. Of the 40 potential participants we emailed, 14 agreed to take the interview, one of whom was female \u2013 the proportion of women involved in this study is on par with the overall representation of women in open source (which is below 5%) [103]. We concluded the recruiting process when the addition of participants stopped generating new emergent themes \u2013 signaling theoretical saturation [28]. Table 1 displays a summary of participants\u2019 projects, their respective roles, as well as descriptive project information.\n\n3.2 Interview Protocol\n\nWe started the semi-structured interviews by following a protocol of scripted questions, which included questions about negative and positive interactions, detection and moderation strategies, codes of conduct, and bot use. From each category of questions, our main goal was to learn what strategies maintainers used to respond to negative interactions such as violations of codes of conduct and issues with bot usage (after we introduced the Sentiment Bot). Specifically, we inquired about the responsibilities of moderating members, expected norms and behaviors of a community and whenever applicable, their resolution strategies for disruptive behaviors in the past, and how they set precedents for future incidents. Each interview lasted 30-60 minutes and participants were compensated $15 for their time via PayPal or a donation to a charity or organization of their choice.\n\n3.3 Analysis\n\nUsing interview recordings and transcripts, a team of two researchers engaged in a bottom-up, thematic analysis of the interviews. The experience of this team in open source contributions ranges from novice to knowledgeable. We adopted a thematic analysis approach to analyze the transcribed video recordings, and followed a shared open coding session to calibrate coding granularity. The first two authors developed the initial lower level codes for each participant\u2019s data and synched weekly to resolve disagreements. After resolving any disagreements amongst the coders, we conducted a bottom-up affinity diagramming process to iteratively refine and group the resulting 375 unique codes into 32 first-level themes, which were then clustered into four main themes that we present below.\n| ID | Project Pseudonyms | Project Area | Role on Project | # Contributors | Stars |\n|----|-------------------|--------------|----------------|----------------|-------|\n| P1 | Honeysuckle | Visual diagramming platform | Maintainer/Contributor | >20 | >5k |\n| P2 | Receptive | Differential Privacy Library | Maintainer/Contributor | \u223c50 | >300 |\n| P3 | Apex | Runtime Environment | Moderation Team Members | >3k | >85k |\n| P4 | JaguarAPI | Web framework for building APIs | Owner/Founder | \u223c300 | >40k |\n| P5 | Grunge | Programming Language | Designated Moderator | >3.5k | >65k |\n| P6 | Hyundai | Alternative firmware | Designated Moderator | >200 | >17k |\n| P7 | Vessel | Container management | Moderation Team Member | >3k | >87k |\n| P9 | | | Owner/Founder | >200 | >500 |\n| P10 | Silverback | Object Storage | Owner/Founder | >90 | >9.5k |\n| P11 | Community Manager | | Owner/Founder | >80 | >400 |\n\nTable 1. Participant Summaries. Project details extracted to preserve anonymity. All references to projects are by pseudonyms.\n\n4 RESULTS\n\nWe start by characterizing types of inappropriate behaviors that moderators observed and monitored, separating the common types of rule violations found in other domains from the more implicit forms of conflicts that emerge from the technical development environment of open source. Next, we describe the types of moderation roles and structures that individuals or groups assume or set up in order to more effectively address and govern misconduct. We then discuss the specific strategies that moderators use to react to, address and prevent misbehavior and incivility. Finally, we summarize maintainers\u2019 stance around the adoption of tools to automate moderation, highlighting various concerns such as over-censorship, technical incapabilities, as well as limited customizability.\n\n4.1 Inappropriate behaviors in OSS\n\nIn most well-studied online communities, intolerable behaviors largely comprised of deliberately abusive and disruptive misconduct such as harassment and hate speech [56, 58, 94]. However, in the context of open source, explicitly inappropriate behaviors are accompanied by more subtle acts and borderline behaviors such as miscommunication and resistance against new practices. When we inquired about moderation, many maintainers brought up strategies they used to respond to and mediate miscommunication, as well as ways of organizing and curbing the excessive volume of demands. While prior works categorizing toxic behaviors on GitHub have also uncovered less severe misbehaviors such as technical disagreements and arrogance [75], we make the distinction between the clearly disruptive content that are detectable by toxicity classifiers and the more covert forms of incivility that require human judgment to identify. In the following subsection we outline some of the more disruptive acts of misconduct (e.g., hate speech, snarky humor) as well as more subtle forms of misbehaviors that OSS moderators observed and guarded against, and follow with strategies they leveraged to address these in 4.3 Moderation Strategies.\n\n4.1.1 Explicitly Aggressive and Disruptive Behaviors. The first class of misbehaviors consisted of explicitly harmful or ill-intended content. We start off by presenting to misconduct that are obvious (e.g., spam) and egregious (e.g. hate speech, harassment) and follow with examples of more\nconcealed (but still harmful) forms of hostility, which include passive aggressiveness and snarky humor.\n\n**Spam, hate speech and harassment.**\n\n- Much of P8\u2019s job as a moderator consisted of \u201cmoderating spam users\u201d, which include instances of a \u201cbot that\u2019s leaving nonsensical comments, opening garbage pull requests that are wasting people\u2019s time\u201d. Even in smaller projects such as Hyundai, \u201cspammers come with things (political) that doesn\u2019t have anything to do with Hyundai, they occur twice a year\u201d (P6).\n- Hate speech like \u201csomeone coming in and saying \u2018why are you people so stupid\u2019, or worse than that\u201d can happen but fortunately \u201cthose are very spotty\u201d (P8). In one case, a banned member threatened to \u201csend collaborators bombs\u201d and afterwards \u201che got arrested, like by the FBI, because he made bombs in his house\u201d (P8).\n- While commonsense rules like \u201cno sexual harassment or no discrimination\u201d seems obvious, P4 pondered how \u201cin some cases it has to be very explicitly stated, because the people that violate those things are probably the people that wouldn\u2019t guess that\u201d.\n\n**Passive-aggressiveness & snarky humor.**\n\n- Both destructive and contagious [77], \u201cpassive-aggressive comments\u201d sadly did present in OSS contexts. They include arrogant \u201cthings like \u2018I have been working for 10 years 20 years and I had never seen a solution like what you\u2019re proposing\u2019 \u2013 something that is not very exclusively saying what you\u2019re proposing is dumb, but . . . kind of implicitly saying you\u2019re inexperienced . . . in a very, very hidden way\u201d (P4) or demeaning insults such as \u201ccan\u2019t you ask an intelligent question\u201d which P4 reports as content that \u201cwe often get within questions threads\u201d.\n- In a similar vein, snarky humor is also advised against because \u201cit\u2019s so easy to offend someone with that\u201d because \u201cit\u2019s really hard to convey what you mean while being snarky on the internet, where nobody can see your face\u201d (P5).\n\n**Entitled demands & heated complaints:**\n\n- Users and contributors who felt entitled to receive responses can take up a significant amount of maintainers\u2019 time \u2013 \u201cthe thing that makes most of the time is the questions, issues\u201d and \u201c80% of the time, or like 90% it will be just like a feature request or a question or [demand for which] like \u2018I\u2019m never really in the user scope\u2019 \u201d (P4).\n- While some requests are easy to address (e.g., simple questions and feature requests), others can get quite heated: \u201c[One user complained that] Hyundai was not good because it wasn\u2019t working on their device (person didn\u2019t read documentation) . . . It started off aggressive, and ended up with the user complaining the documentation wasn\u2019t good enough\u201d (P6). Ironically, \u201cin many cases [it] is just like errors in the code of the developer (who\u2019s asking) and they didn\u2019t realize\u201d (P4). Yet someone must attend to the issues because \u201cIf you ignore people they get more mad . . . and they act out more and more.\u201d (P11). And the problem with entitled comments isn\u2019t the comment itself, \u201cit\u2019s the knock on effects of that comment . . . other people will see that and think it\u2019s okay to behave that way . . . [and] feel more entitled because they\u2019ve seen entitlement be normalized\u201d (P3).\n\n### 4.1.2 Misunderstandings, technical disagreements, and resistance against new practices.\n\nIn contrast to explicitly aggressive misbehaviors, our moderator participants also reported monitoring for more subversive disagreements and misunderstandings that arise from the technical and collaborative nature of OSS projects.\nAside from intentional misconduct, \u201cmany times bad behavior is just misunderstandings\u201d and \u201cit boils down to like miscommunication and not understanding the issue like people talking past each other and people getting a little bit heated\u201d (P9). According to P13, miscommunications occurred frequently: \u201cIf you dig into old threads, you see a lot of them are full of miscommunication and people shouting over each other about who should have had dealt with what\u201d.\n\nTechnical disagreements are easy to surface in development environments, because \u201csometimes people simply get riled up, they have an idea of what is right or wrong and someone else has a different idea, which in tech can happen\u201d (P5). In one instance when people did get heated after \u201ca disagreement with the licensing of the code\u201d which was \u201cfrom another project library\u201d, some contributors unfortunately \u201cfelt [the need to use] \u2018accusatory\u2019 language\u201d.\n\nTechnical projects often need to adopt new pipelines and packages to keep up with recent updates and practices, but sometimes new standards are met with \u201cresistance initially, usually because of large changes such as build pipelines\u201d (P2). So first they must \u201cget through the transition period\u201d (P3), but \u201cover time there\u2019s acceptance\u201d (P2), \u201cand the new norms will just be the way it is, and everyone will be horrified that it used to be worse\u201d (P3).\n\n4.2 Moderation Roles and Structures\n\nWhile open source communities were once perceived as decentralized and bazaar-like, emergent governance structures form over time [13, 64, 82]. Maintainers in our sample employed a plethora of strategies to overcome interpersonal and technical challenges of social coding. Depending on the size of the project or organization, maintainers varied their governing structure and strategies. Specific moderation actions were performed by members of the community, a moderation team, or maintainers themselves. The most basic form of moderation involved contributors performing self-censorship. After that, volunteer moderators described how they reported potentially harmful content and actions to maintainers or formal moderation teams. In the following sections we describe how participants in our sample described collaboration between different roles and governing powers to conduct moderation together.\n\n4.2.1 Self-moderation and Volunteer Moderators. When a particular individual violated the community rule or norm, self-moderation constituted the first line of defense. Unlike the broader term of community self-moderation that Seering proposes [91], we consider self-moderation to be the individual self-corrective action of the author to edit and fix their own content, regardless of who first noticed the questionable content. In the case of large projects like Apex, maintainers may instate \u201can explicit policy to ask organization members to self-moderate, such rules that \u2018allow [maintainers] a way to say: \u2018if you just made a mistake, and you apologize and don\u2019t do the behavior again, you\u2019ll be fine\u2019. . . in a way that displays those norms for the community\u201d (P3).\n\nMember status affected who received self-moderation requests \u2013 when the original author was \u201cnot a[n internal] collaborator, then the moderation team can just summarily do what we decide is the best\u201d (P3). However, when internal organization members exhibit problematic behaviors, \u201cthen the first thing [we are required to do] is to always ask them to self-moderate\u201d (P3). In requesting self-moderation from contributors, maintainers asked for specific actions like editing or deleting the offensive comment, so as to avoid public shaming directed at the author or other escalations.\n\nSince social coding platforms like GitHub are working environments for producing software, team members are expected to treat each other with civility. So even if \u201cYou don\u2019t have to like each\n\n---\n\n3Acts of self-moderation erases many public records of accidentally posted harmful content, thus we suspect that the practice is prevalent but often undetected. While Apex was the only project reporting self-moderation, it is also one of the most established OSS projects. Hence we expect self-modertating to appear in other projects as well and encourage future work to explore the detection and frequency of self-initiated moderation.\nother . . . you have to be professional\u201d (P13). Therefore, when P13 asked contributors who harbor negative feelings toward each other to \u201cto self-moderate, . . . they did\u201d and in general \u201cpeople . . . are usually regretful that the comment was hurtful . . . [and] will be eager and happy to self-moderate.\u201d\n\nBut in some cases, uncooperative contributors refused to conduct self-moderation, and one cause of this behavior was a difference in cultures. In one case, P13 asked for self-moderation by posting a request along the lines of \u201cHey this comment is perceived as . . . problematic, can you please consider self-moderating it\u201d. If the recipient is from the US, then they would understand that \u2013 \u201cyou\u2019re really telling them to do that\u201d. But \u201cin Israeli culture, it\u2019s perfectly acceptable for them to say \u2018No, I considered it and I think I have a better understanding than you\u2019\u201d. When contributors refused to cooperate, moderators escalated to more direct measures to intervene, which we cover in 4.3.\n\nTo delegate some of their responsibilities, maintainers of more popular libraries such as Apex distributed moderation work by relying on community members: \u201cmost of the time somebody reports it . . . they can surface it [by] say[ing] \u2018hey check out this\u2019 in a moderation repo that\u2019s private to org members\u201d. While maintainers would prefer to hide contentious content from contributors and users - \u201cin an ideal world, we don\u2019t require somebody to report it before we fix it\u201d, some of it inevitably goes undetected in larger projects: \u201cthere\u2019s a scalability thing there\u201d, and community reporting can serve as a \u201ca useful filter to prevent all of our time being taken up by hunting down problems\u201d (P3).\n\n4.2.2 Formal Moderation Teams. Larger and more mature projects designated particular volunteer members from the community to form an official moderation team for the organization. The Apex moderation team, for example, consisted of \u201c8 to 10 people, 5-6 who are regularly active\u201d (P13). Moderation team members were self-nominated and the role is not even exclusive to contributors- \u201cany member who is on the project. . . [can] say \u2018hey I want to be a moderator\u2019, and if nobody objects for seven days, they join the team\u201d. Team members are recertified annually by a Technical Steering Committee (TSC), which guided and advised the organization with higher-level directives.\n\nAmong the ten projects we interviewed, six had designated moderators, all of whom were appointed due to existing demand. For instance, when P12\u2019s moderately-sized \u201cproject first started getting popular\u201d, he had \u201cno clue how to moderate\u201d. But growing attention eventually convinced him to assign moderators: \u201cpeople were demanding moderators - so very quickly I had to choose a moderator . . . [and these] moderators [would] tell people to calm down and most people are respectful\u201d.\n\nMaintainers often encouraged contributor interactions with moderators to help offload their maintenance responsibilities. In one case, P5 of the popular programming language Grunge would tell users \u201cIf you have a question about anything that disturbs you or that you may think has disturbed others, contact the moderators\u201d. P7 of the mature project Vessel also reports how they often redirect users and contributors to \u201ctalk to a moderator on Slack\u201d whenever the community had questions and doubts around governance actions such as instituted bans, so that moderators can provide them the appropriate explanations. Even for more nascent repos such as Silverback, P11 explicitly \u201cset up community values that proactively explains to people what the community will look like\u201d, that way \u201cif someone is blocked, and they don\u2019t know why they were blocked, or they think they should be unblocked, they know where to get in touch with us.\u201d (P8).\n\nOutside of moderating, members may be responsible for onboarding tasks such as taking a training course: \u201cIt was an online course, we went on to Zoom (the whole team) for like a few weeks and we did a training. We should probably do another round [of refreshers] because some people joined \u201d (P13). And since the labor of performing moderation actions (e.g. providing explanations) is can be draining [59], it is a moderator\u2019s own responsibility to self-assess and take breaks to avoid burnout: \u201cModeration is something I do for a while, I stop doing it for a while, I do for a while, I stop doing it for a while, \u2019cause I burn out.\u201d (P8)\n4.2.3 Power Sharing Structures. While a moderating team members hold the power to execute governing actions (e.g., interaction limits or bans), they also experienced power restrictions. Restrictions typically originate from higher-up governing bodies such as the Technical Steering Committee, but efforts to decentralize and democratize moderation also encourage community members to call review and call out and misjudgments of moderators.\n\nTechnical Steering Committees (TSCs) tend to appear only in larger projects (only 3 of the 10 projects we interviewed formed one \u2013 Apex, Vessel and Grunge), where the sizable number of internal project members calls for top-down governance. In Apex for instance, P13 was blocked from directly removing an internal member because \u201conce you\u2019re a collaborator you can\u2019t really be removed\u201d. In order to remove an internal collaborator, \u201cthe Technical Steering Committee needs to vote . . . we [the moderation team] wouldn\u2019t [typically] remove a collaborator\u201d (P13).\n\nThe TSC shoulders many technical and governing responsibilities, serving as \u201cthe unifying factor\u201d of the project (P13). But the TSC also exhibits \u201ca very strong bias towards inaction, by design . . . because . . . making the wrong technical decision is a lot riskier than not making technical decision.\u201d Finally, the TSC also consists \u201ca lot of people who are very technical, [so] they don\u2019t like dealing with interpersonal issues\u201d (P13). The combination of limited bandwidth, composition of technical members, and tendency toward inaction means that the TSC is slow in approving requests for actions like removing collaborators. As a result, maintainers of larger projects eventually \u201cdetermined that we needed a separate body from the Community Committee and the Technical Steering Committee to handle these [governance actions] because membership on the TSC does not mean you have any idea how to handle a code of conduct report\u201d (P8), leading to the formation of official moderation teams in project Vessel.\n\nThe TSC holds powers above the moderation team (e.g., the ability to remove internal collaborators), and moderators must additionally \u201cdo a weekly report to the TSC about what moderation actions have happened . . . [to adhere to] . . . our governance documentation\u201d (P8). In addition, moderation teams set up structures to also encourage project members to check their judgments as well, so as to ensure a more democratic distribution of moderating powers:\n\n\u201cWe always invite people to call the other mods to check that we are actually right because we get it wrong. Because otherwise if we wouldn\u2019t have rules to follow then it would be, well \u2018this mod didn\u2019t like my nose so he banned me\u2019 \u201d (P5)\n\n4.2.4 Reporting Mechanisms. To support the reporting of misconduct from volunteers, moderators of larger projects set up \u201ca private moderation repo\u201d so that \u201ccollaborators (\u223c500-600ish of them)\u201d can \u201copen issues there to notify the moderation team that \u2018here it\u2019s something that . . . you need to look at\u2019 \u201d (P8). These moderation repos for community reporting works for larger projects because \u201cfor very contentious topics [in] issues and pull requests (which happen occasionally in most projects), someone will notice and surface it even though there\u2019s nothing bad yet\u201d. In addition to providing a centralized place for members to submit reports, this strategy enables moderation team members to \u201cstart subscribing to it and jump really quickly when something happens\u201d.\n\nIn addition moderation efforts from the community, reports to GitHub constitute another avenue for escalation if moderators don\u2019t have the power to edit particular posts or close specific user accounts. For instance, P8 relates how \u201cThere are definitely some blind spots and missing parts . . . certain types of comments you can\u2019t edit or delete . . . that\u2019s a bit of a problem. We have to contact GitHub if it gets really bad \u2013 but if it gets really bad you just report the user and eventually all their stuff gets deleted because GitHub just deletes the user\u201d. Spammers who occasionally attacked the mid-sized project Hyundai were dealt with in a similar way: \u201cspammers that come with things (political) that doesn\u2019t have anything to do with Hyundai, they occur twice a year and we have to delete/close issues. They are also reported to GitHub, who may close their accounts\u201d.\n\nProc. ACM Hum.-Comput. Interact., Vol. 7, No. CSCW2, Article 301. Publication date: October 2023.\n| Moderation Strategy | Description | Example Actions |\n|---------------------|-------------|-----------------|\n| Punitive | Reactive measures taken to eliminate harmful content and prohibit interactions that cause rapid and excessive negative engagements. Used in situations when someone acts in a clearly outlawed manner or activities that cause high levels of community response | Hiding/deleting comments, bans, interaction limits, locking conversations, calling out bad behavior. |\n| Mediations | Diplomatic interventions taken to resolve small-scale misunderstandings and agreements. Used for disagreements between a small number of (usually internal) contributors. | Correcting misunderstandings, forming negotiations. |\n| Preventative | Inhibitory: Precautionary measures used to prevent the development and further escalations of conflicts. Used in situations that maintainers perceive to have the potential to escalate, such as expressions of indirect hostility, inside jokes, belittling comments | Issuing warnings, calling out behavior that are perceived to have potential to escalate. |\n| | Proactive: Setting up rules and workflows to avoid the repetition of similar mistakes and future user/contributor frustrations. Used after repeated offenses. | Setting up private moderation repos, codes of conduct, linters, templates, topic-specific channels |\n| Reformative | Educational approaches to rehabilitate misbehaviors and set up acceptable standards. Used after unintended neglect of rules or repeated violations by multiple members. | Offering explanations, polite admonishment. |\n\nTable 2. Summary of Moderation Strategies\n\n4.3 Moderation Strategies\n\nIn an ideal world, maintainers should not have to monitor and respond to negative interactions. But despite their best intentions, contributors did end up engaging in heated conversations that escalate quickly out of control. When such unexpected situations occurred, maintainers reacted by utilizing a set of existing tools on GitHub to help limit, de-escalate or remove the interaction. But sometimes it takes more in-depth interventions to resolve a conflict, in which case maintainers and moderators performed the role of the conciliator to mediate the dispute. Fortunately, many misbehaviors can be anticipated and prevented once moderators have witnessed and intervened in similar incidents. In such instances, moderators took preventative actions to deter further escalations and avoid future mistakes or reformative strategies so that newcomers to the project can distinguish the acceptable behaviors from the inappropriate. Once established, norms guided contributors toward more productive, healthy and efficient interactions. Table 2 shows definitions of the moderation strategies we uncovered, and example actions associated with each strategy.\n\n4.3.1 Punitive Strategies. Punitive strategies consist of reactive moderating actions such as content removal, bans, locking of conversations, or strict enforcement of codes of conduct guidelines to eliminate harmful content and disruptive behaviors. These were usually taken immediately after severe situations such as unexpected debates and outbursts, so as to limit the impact of inappropriate actions and prevent further escalations of conflict.\nWhen content removal was sufficient to conclude and archive an exchange, moderators simply hid or deleted comments. P3 of Apex related his latest preference for hiding comments over deletion since GitHub introduced public deletion receipts: \u201cI don\u2019t delete comments anymore because GitHub leaves a record that you\u2019ve done it . . . because of that it\u2019s more effective to hide them all as abusive or off-topic\u201d.\n\nUnlike deletion (which now leaves a public trail of delete receipts), folding the content via hiding offered transparency, which was found to 1.) improve legitimacy and accountability 2.) increase perceived consistency and 3.) prevent confusion and frustration [59]: \u201cPeople can still read it (which sucks), but then there\u2019s not an illusion of censorship, which is worse than people reading the content, but not as good as the content being erased.\u201d (P3). Prior to public deletion receipts, \u201cif there was a hugely toxic exchange that was irrelevant to the issue, I could sum it up and then delete all the comments and nobody had to see that toxic exchange had happened\u201d (P3).\n\nModerators found it was crucial to enforce existing rules to maintain a healthy and supportive environment. In the case of political spam in Hyundai, P6 recounted having to delete and close issues, as well as the report the accounts to GitHub. To clearly outline desired behaviors, it can be helpful to have a \u201ccode of conduct, and being open about enforcing it helps a lot, because people know what they\u2019re getting if they go that way. . . acceptable behavior is pretty much laid out [there]\u201d (P5). P5 additionally emphasized the importance of invoking existing rules:\n\n\u201cWe have the moderation team to enforce this; we want to be constructive at all times. We do not accept people harassing other people or calling them names or generally being negative, it\u2019s rather frowned upon. Basically we criticize the code not the person: be constructive, be on the point.\u201d\n\nModerators also called out clearly toxic behaviors that were not yet explicitly delineated in existing rules. For instance, within the project of popular language Grunge: \u201cwe call out bad behavior when we see it.\u201d (P5). These concerns were raised directly on GitHub or through other media: P9 and his team members on Vessel \u201ccall out bad behavior by sending screenshots over the team\u2019s Slack\u201d while P13 and his team on Apex post in a moderation repo to encourage accountability \u201cYou open an issue in moderation repo, so they see that you\u2019re aware of it . . . and often that\u2019s enough to get them to de-escalate and no other people are watching.\u201d\n\nReactive approaches require quick responses, since escalations tend to unravel quickly \u201cEither we don\u2019t notice it, or we say \u2018hey it\u2019s banned\u2019 or \u2018work it out\u2019 . . . But . . . sometimes it\u2019s a bad thing to catch it like one day too late, and at that point it\u2019s too late.\u201d (P13). Should disagreements develop into more heated debates, moderators would institute temporary bans: \u201cThere will sometimes be very heated discussions, we may institute a one-, or even in some cases the seven-day ban, so they can cool off and then come back, refreshed, hopefully.\u201d (P5).\n\n4.3.2 Mediations. \u201cIn an OS community, the implicit foundation of it is that all contributions are valid, or that everybody has an equal stake in doing something\u201d (P11). But disagreements occurred when maintainers and contributors have mismatched expectations for the future state of the software [43]. During such interpersonal conflicts, it fell to moderators to hear out all perspectives and mediate underlying conflicts to resolve disagreement and limit the development of toxic behaviors.\n\nMediation involves communicating with multiple parties involved in a conflict (individually or as a group) to resolve any misunderstandings or negotiate any conflicting objectives when collaborating on a decision in the project. P13 described how one party engaged in a conflict sought out moderators to mediate situations: \u201cYou find the moderator that is respected by both parties that are involved in the conflict. . . . Then you talk to them, if they\u2019re nice they usually agree to facilitate things. You get them to hear both sides, they take it from there.\u201d And having conducted mediations himself, P13 elaborated on the sequential process of mediations. To start off, the moderator speaks\nwith individuals from both sides of the conflict: \u201cyou just talk to the sides, . . . you try to figure out the conflict, you try to get them to see the other person\u2019s perspective\u201d. In some cases someone actually did commit a wrongdoing or misconduct: \u201cSometimes there is a clear person who is right in the conflict . . . usually the other party will either admit that or dig in.\u201d But more likely it\u2019s just a miscommunication: \u201cOften there is not [someone in the wrong], it\u2019s just like a misunderstanding, and just getting people to see the misunderstanding and the other person\u2019s perspective is usually enough\u201d.\n\nP13 of project Apex also recounted approaching mediation by giving all parties the benefit of doubt: \u201cMost of these people are good people and good engineers, and there is very little malice in the project. Just assuming good faith and trying to approach it from a point of like \u2018these are reasonable, decent human beings\u2019 is often sufficient, in terms of figuring out the right side.\u201d Meanwhile, P14 of a smaller project found mediation to be a negotiatory task: \u201cIt\u2019s all about negotiation. You talk with engineer A, you tell them what you don\u2019t like. You try to talk with engineer B, you try to see if what engineer A is proposing will work with engineer B, and you try to come up with a tradeoff.\u201d\n\nSome maintainers were happy to act as an intermediary from the beginning. For instance P1 related how \u201cI would rather be [a] middleman than to call out anyone for toxicity\u201d while P14 helped his contributors ask for clarifications: \u201cSometimes people can come to me and say: \u2018I read this, not sure how to take it, if it\u2019s personal or something\u2019. Usually I know all interested parties and I try to ask the reviewer to rephrase the message, to clarify it.\u201d But founders of more mature projects like JaguarAPI were not as comfortable with mediating - \u201cI\u2019m trying to mediate, which is strange because that\u2019s not something I would normally do. I wouldn\u2019t normally engage in an aggressive conversation\u201d (P4). But due to hypervisibility of the project and obligations to protect community members, P4 wound up learning how to learn anyway: \u201cI feel like I have to protect the community and the people that are around there, around my family. So I end up having to stop whoever is kind of harassing us\u201d.\n\n4.3.3 Preventative Strategies. Mediations and punitive strategies describe ways that moderators react to conflicts of different scales. While these techniques can be taught and directly performed by any new moderator, it takes more experience to anticipate and prevent budding or future disputes. Kiesler et al. presented ways to limit the impacts of misbehaviors as well as the performance of bad behaviors in their meta-analysis [2]. Below we categorized these two types of strategies as inhibitory and proactive preventions, where moderators used the former to prevent escalations the latter to proactively set up workflows to prevent frustrations and ensure conformity to standards.\n\nInhibitory Preventions. Not all conflicts end up escalating into full-blown arguments between contributors, and most of the time it was up to human moderators to predict the onset of harmful behaviors. Inhibitory preventions involve warning-based, reproachful techniques that moderators leverage to target indirectly hostile behaviors (e.g. inappropriate jokes, passive aggressive behaviors), so as to limit harm and avoid further escalations. The indirectly hostile behavior in open source projects is analogous to the concept of \u201ctoxicity elicitation\u201d in online text-based communities [110], which are comments or behaviors that elicit high toxic responses, but doesn\u2019t necessarily contain toxic language itself. The preventative actions targeting these behaviors included monitoring conversations, calling out and correcting misbehaviors, or issuing warnings.\n\nPassive aggressive behaviors were a classic example of indirect hostility that participants brought up, and P5 of Grunge recounts how \u201cwe always stop this. You have to nip it in the bud, because people new to the language come there to ask questions, that\u2019s always a delicate situation\u201d. To reduce the chances of newcomers dropping off, \u201cwe\u2019re extra careful there to protect those people from know-it-alls and people who just ooze negativity\u201d (P5). P11 from Silverback similarly practiced the firm enforcement of rules to prevent escalations: \u201cyou just firmly enforce it, and that itself creates a good culture because you nip these things in the bud. You don\u2019t let them escalate out\u201d (P11). In the absence of existing rules, moderators issued preventative warnings to de-escalate situations. \u201cOther than\nbans, it can even just be a proverbial slap on the wrist. We call out bad behavior and if they fix it directly then that\u2019s totally okay, we appreciate that not everyone is at their best\u201d (P5).\n\nEven though these comments were not as outright and blatantly harmful, they did contribute to the normalization of hidden hostility:\n\n\u201cThe problem isn\u2019t the offensive or toxic comments, that\u2019s not actually the issue. It\u2019s not actually a problem that someone is entitled, in a comment directly. It\u2019s the knock on effects of that comment, it\u2019s that other people will see that and think it\u2019s okay to behave that way, it\u2019s that other people will feel more entitled because they\u2019ve seen entitlement be normalized.\u201d (P3)\n\nProactive Preventions. After observing repeated instances of misconduct, moderators proactively established rules and workflow standards to avoid the repetition of similar mistakes, minimize the amount of harm that bad actors can perform, and guide new contributors toward desired standards and practices, which has been found as an issue for newcomers [43]. Specific structures include codes of conduct, private moderation repos, formatting linters, templates that help contributors better frame their questions and suggestions, or channels for organizing existing answers. While these structures did not directly take place after an offense, their presence created structures for support and information dissemination, thereby minimizing questions and issues raised from users and contributors.\n\nIn the case of P8 from Apex, an entire moderation team was set up in reaction to a conflict: \u201cThe moderation team is set up in reaction to Apex botching . . . [a situation]. It was a public relations fiasco.\u201d A team member P3 also described how besides moderation teams, contributors also set up codes of conduct after instances of conflict: \u201canyone who has run into the need for moderation or codes of conduct, is going to be very quick to implement it in a community they enter or create\u201d.\n\nTo minimize the edits that maintainers need to make to contributors\u2019 submissions, templates helped to list out necessary components to include in new pull requests or issues: \u201cIn the repository, in the topmost folder, there is a Contributing document that says: your pull request should have this title, commit messages should be in this format\u201d (P3) or assist users in drafting issues \u201cI added a load of information to the template and a lot of requisites to ask people to build a very simple example of what is it that you want.\u201d (P4).\n\n4.3.4 Reformative Strategies. Not all acts of misconduct are borne from malicious intentions [59]. Sometimes when moderators observed repeated instances of misbehaviors, they employed a more nurturing and reformative approach that doesn\u2019t castigate contributors for unintentional offenses. Unlike the punitive or preventative strategies that remove a member\u2019s content or right to interactions, reformative techniques are more educational and gentle, consisting of actions like polite admonishments or providing explanations. Over the long term, artifacts from reformative approaches (e.g. explanations) benefit the community by establishing acceptable behavioral standards, even if they take some time for communities to adopt. By offering benefits such as transparency and a way to establish new norms for subsequent community members, reformative approaches garnered increased advocacy from researchers in recent years [53, 79].\n\nSimilarly, reformative strategies were well received among open source practitioners as well. P3 of Apex related positive feedback from his community: \u201cThe polite admonishment (when I word it eloquently enough) tends to gather lots of heart and thumb-up emoji reactions, and the person will either apologize or just dip out and be quiet. So it\u2019s the most effective form of response.\u201d In a newer project, P11 redirected a raised issue to demonstrate a more efficient response for typos: \u201cThanks for point this out. However, instead of raising this as an issue, if you ever see small typos, please feel free to just put in a pull request to fix them\u201d. However, providing explanations is a nontrivial\namount of work, so sometimes maintainers fall back on preventative strategies: \u201cI don\u2019t always have the energy for that so sometimes I\u2019m hostile back . . . sometimes biting comments in response are effective, at the cost of other people seeing me as a jerk, but it still establishes that behavior is not acceptable\u201d (P3).\n\nOne side effect of politely admonishing community members is the potential loss of a contributor, but often that risk is outweighed by the knock-on effects of unaddressed misbehaviors:\n\nEstablishing what behavior is acceptable and what is not . . . [it] is performative - it\u2019s showing everyone else in the arena that that behavior is not okay, even if that means that person is not going to improve. And while it\u2019s always preferred to rehabilitate someone, or convince them to re-evaluate. . . I\u2019d rather lose a person forever from the community than have the rest of the community see toxic behavior go unchallenged.\n\nIn addition to polite admonishment, P2 of a new differential privacy library showed how reformative actions also offer explanations of newly established norms and practices: \u201cNew pipelines are introduced through meetings, and introducers explain why they\u2019re better, they are then more accepted by contributors after explanation\u201d. Some of these standards took a period of transition for some communities to adopt, demonstrating a case of normative conflict identified from [38]:\n\n\u201cWhen a community starts moderating it\u2019s overwhelming for a while. . . The goal is to get everyone to be as open and tolerant and respectful as possible and that goal is not . . . most efficiently achieved by immediately jumping to a list of all the things that are potentially a problem . . . each medium has to get there on its own time, in its own way so the norms can be established and everyone can accept them.\u201d (P3)\n\nHowever, once communities adopted a good practice, they grew to appreciate it over the long run: \u201cAnd then norms are established; they know that it\u2019s safe to admonish newcomers to behave that way. And the incidence of reports just plummets. People just don\u2019t screw up when they know what the norms are supposed to be. We\u2019ll get through the transition period and the new norms will just be the way it is, and everyone will be horrified that it used to be worse.\u201d (P3). Among collaborators, such adoption frictions were usually mitigated by group meetings and discussion: \u201cNew pipelines are introduced through meetings, and introducers explain why they\u2019re better, they are then more accepted by contributors after explanation.\u201d (P2).\n\n4.4 Automation of Moderation\n\nMost of the interactions on social coding platforms are text-based, making them well-suited for automation when compared to their social media counterparts [58]. As a result, bots and GitHub Actions easily leverage the available repo artifacts to facilitate various workflows and protection mechanisms for their projects [108]. Half of our participants mentioned using or considering automated tools to facilitate community moderation. While most of them did not currently have moderation tools set up on their repos, Hyundai had a positive experience using the Sentiment Bot, and Silverback had installed the alex bot, which detects instances of \u201cgender favoring, polarizing, race related, or other unequal phrasing in text\u201d [109].\n\nHowever, our interviewees perceived current bots to be inadequate for conducting moderation beyond simple reactive warnings. Moderators reported that community members can view automated moderation tools as over-censoring and policing forces that threaten their freedom of speech, especially due to their tendencies for false triggers. Furthermore, the more subtle forms of misbehaviors found in the professional space of software development (such as those covered in Section 4.1.2 Misunderstandings, technical disagreements, and resistance against new practices) are difficult to anticipate by the language models\u2019 underlying moderation tools. Meanwhile, the\ntools used for moderation are seldom adapted to the development context and lack access to cross-platform information, increasing the chance for false alarms. Finally, the absence of customization options for privacy and notifications breached social boundaries between users, contributors and maintainers by exposing deletions and callouts to the public and unnecessarily demanding maintainers\u2019 attention with excessive and overly public notifications. So despite the potential for bots to perform automated moderation on the behalf of maintainers, many of our participants expressed concerns and adoption frictions. Below we highlight some of these existing tensions as well as maintainers\u2019 stance on the utility and impact of moderation bots.\n\n4.4.1 Automated Moderation Breeds Over-Censorship. In public online spaces, the right to free individual expression inevitably trades off with concerns of wellbeing and public safety [49], and our interview participants perceived the potential for automation to lead to over-censorship. Free speech has long-standing associations with source code, and the metaphor was leveraged by early supporters of F/OSS to protect the right to use, modify and distribute software [27]. As a result, open source communities have strongly embraced and valued the right to free speech. However, in volunteer-based development contexts, it is also important to foster a safe space that welcomes contributions from everyone, especially given the limited diversity and inclusion in modern open source communities [71]; P8 of Apex points out the shortcoming \u201cAs with a lot of organizations . . . we struggle with representation.\u201d But as Gibson has found, moderators use punitive strategies more within safe spaces [46], which leads to a sense of over-censorship. Teammate P3 described the two opposing value systems:\n\n\u201cThere\u2019s a spectrum of how much we want to modify our language to avoid offending people. And the folks who generally resist political correctness are the ones who are on the side of saying, \u2018I\u2019m going to say whatever I want, if you\u2019re offended that\u2019s your problem\u2019 and I think that sucks and I don\u2019t want to be there. But the other extreme is problematic too because it damages the message by turning into policing.\u201d\n\nMaintainers delicately balanced this friction between individual contributors\u2019 desire for free speech and the community\u2019s need to create a welcoming space for female, LGBTQ, and other underrepresented groups [71, 105]. Depending on contextual needs, maintainers balanced the freedom of expression for contributors with broader project goals of promoting an respectful and inclusive community, or as P14 eloquently expressed: \u201cthe tradeoff between this [moderation] and freedom. You have to balance, don\u2019t want to restrict people, but you also want everybody to play nice.\u201d\n\nWhile some maintainers struggled with the tradeoff between free expression and enforced civility, P10 (founder of Silverback) expressed active resistance against demands for free speech: \u201cThis is a slippery slope, [as if] we\u2019re going to \u2018lose all of our words\u2019 and \u2018it\u2019s going to be 1984\u2019 and \u2018we can\u2019t express ourselves\u2019.\u201d To assert that automations are not powerful enough to suppress our creativity and right to speech, P10 challenged users/contributors who complained to test out AlexJS, telling them that: \u201cif you lose more than like 15 words in a year red we can rediscuss this. But . . . you\u2019re smart enough and creative enough, and the language is large enough, that you\u2019re going to be just fine\u201d.\n\nMost of the automation assistance our participants considered focused on language use, and less so on other forms of misbehavior. However, excessive attention to and moderation of language misuse also derailed conversations topics away from the development of software - \u201ceven the people who were philosophically aligned with the idea of avoiding gender words were still irritated that the normal topic of the channel was distracted or disrupted by that conversation so frequently\u201d (P3). Such concerns are another set of nuances that bots would have a hard time taking into account: \u201cif you get a tool like that \u2013 it can legitimately be seen as being too pedantic or too tightly wound about certain words. And [if] the culture of that community isn\u2019t ready for that yet, then that\u2019s worse than not saying anything\u201d (P3).\n4.4.2 Perceived Technical Limitations of Automating Moderation. Participants perceived that moderation bots deployed on GitHub doubly suffered from context specificity due to 1.) situation-specific nuances that are difficult for current tools to pick up on and 2.) technical terms used in software development environments. On the one hand, a lack of nuanced understanding of situational contexts made it difficult for models to detect new and more subtle variations of misbehaviors. On the other hand, the underlying language models lacked contextual sensitivity to technical terms, triggering false positives that require additional human labor to review. The combination of such shortcomings caused hesitation in delegating moderation responsibilities to automation tools.\n\nInability to Anticipate False Negatives. Human collaborators can easily retrieve information distributed across platforms by toggling between them, but most automation tools can only access a single deployment context or platform: \u201ca lot of the mediums [where] they have discussions are not in public \u2013 the bot wouldn\u2019t have access\u201d (P13). Without multiplatform contextual clues, bots failed to pick up on interpersonal relationships or intended meanings from a working environment, and \u201ceven inside the discussion there is a lot of background [and] the bot would have a very hard time to figure it out\u201d (P13). For instance, P6 of Hyundai pointed out that \u201cwhen people are passive aggressive, the bot cannot understand that, and it\u2019s better to interact with a human\u201d. Hence, much like the moderators of social media contexts, open source moderators prefer human-reviewed decisions \u201cfor the interpersonal conflict inside of projects [since] it would be impossible\u201d for automated assistance of moderation to work [94]. P3 also shares the stance that human should be involved in moderating decisions: \u201cIf it was just maintainers that see it: fine, I can make a judgment call\u201d.\n\nContext Specificity Raises False Positives. General-purpose sentiment analysis models have struggle to pick up on connotations of context-dependent terms, causing learning-based models to falsely trigger on common software engineering terms that carry negative denotations when used in everyday contexts. Consequently, maintainers had to manually review instances of such false positives triggers to ensure accuracy, exacerbating their already limited bandwidth. For instance, P3 introduced how AlexJS could offend individuals by aggressively flagging words with slightly negative connotations: \u201cAlexJS . . . ended up . . . [being similar to] the archeology conference where they couldn\u2019t say the word bone, because some software I flagged it as offensive.\u201d In project Silerback, P10 also observed how \u201cAlex triggers on \u2018master\u2019 because of [the upstream dependency] Vessel\u201d because project Vessel had not yet renamed their \u2018master\u2019 branch as \u2018main\u2019.\n\nOne particular example of contextual information missing from detection models was a contributor\u2019s primary language. When someone\u2019s native tongue is not English, their comments can accidentally trigger bot reactions by unintentionally using phrases that carry negative connotations and innuendos: \u201cEnglish not being a first language may affect them.\u201d (P8). Li et al. has highlighted how moderators must use intuition (i.e. guessing) to discern between behaviors needing intervention from unintentional offenses caused by language differences [71].\n\nParticipants also worried about how bots will treat self-directed anger. In one instance, the founder of project Hyundai \u201cwas answering a question and said \u2018Oh yes, don\u2019t worry about that feature, it was a rubbish feature and we already fixed it\u2019 and the bot was triggered.\u201d (P6). Likewise, P10 \u201cworries about the use of negative language in code due to personal experience writing code (mostly self-directed).\u201d While P6 was able to find humor in the situation, \u201cSometimes the Sentiment Bot flags not so aggressive phrases and it\u2019s a funny occurrence\u201d other instances may be more frustrating, especially if the triggering words are frequently used.\n\nMost maintainers hold the opinion that false positives cause harm, especially if they add noisy information: \u201cfalse positives become a big problem, especially because they\u2019re a distraction\u201d (P5). But the potential for false negatives to cause disruptions depend on context: \u201cSometimes false positives are acceptable, better than missing something . . . but there\u2019s also sometimes where false anything is not\nacceptable and it\u2019s better to say nothing, than to have a false result.\u201d (P3). Too many false triggers can numb maintainers\u2019 responses to warnings (a behavior consistent with the findings of Wessel et al. [108]), perceiving them as noise instead and thereafter ignoring them altogether: \u201cWe have something called a stale bot . . . it periodically will just put comments on tickets and send emails, which is not bad. But for whatever reason we\u2019ve learned to ignore it sometimes.\u201d (P10).\n\n4.4.3 Customizations and Boundaries. Participants reported strong needs to tweak and customize the tooling based on specific project needs. In existing automation tools, the lack of personalization options harmed adoption rates. For instance, the absence of options for notification settings caused information overload and fatigue for maintainers, especially given the possibility of the abovementioned false positive triggers.\n\nIn Sections 4.2.1 and 4.2.4 we discussed the distribution of moderation work to volunteer contributors. P3 brought up an in-house GitHub feature that supports this volunteer reporting framework \u2013 \u201cGitHub has this reporting facility that I don\u2019t think too many people know to turn on. It allows arbitrary users to say this is a problem, pay attention\u201d. Unfortunately the feature lacks notification settings: \u201cIn [Apex] we actually turned that off (it was on for a while) because you can only turn it on for the whole org or none of it.\u201d\n\nPrivacy is another setting that maintainers wanted to configure. While transparency has been found to improve collaboration in open source [30], not all maintainers are ready \u201cto be that transparent, and that direct\u201d, and P3 is of the mind set that completely transparent configurations are \u201cgoing to have consequences that those folks didn\u2019t anticipate and that a private system allows for more bias\u201d. However, they also conceded later that privacy \u201calso allows . . . for a more refined response, hence it is of importance to have the agency to configure for private notifications: \u201cHaving it surface just so I notice it . . . [it should also be that] I can also tweak how sensitive it is, instead of having a default setting\u201d.\n\nThere exist yet other bots that are intended to lessen the burden of maintainers, but have instead crossed a social boundary that maintainers were not entirely comfortable with. For example, P3 worried about how the automatic closing of issues deprioritized the time of community contributors: \u201cI don\u2019t use Probot at all primarily [because] most of the usage of it I\u2019ve seen has been programmatically closing issues (like stale issues) and I think that\u2019s insanely user hostile . . . prioritizing maintainer time over the feelings of users and I think that\u2019s not a good trade off.\u201d In a similar vein, P1 also claimed he \u201cwould not consider anything that\u2019d directly communicate with the contributor, because we value every single one of them\u201d and allowing direct communication between automated moderation tools and contributors could risk offending and losing valuable community members.\n\n4.4.4 Anticipated Role of Bots in Moderation. Perhaps due to the shortcomings outlined above, maintainers for repos of various sizes indicated that projects should not (solely) depend on automation for moderation. As a moderator of the popular project Grunge, P5 thought its presence would be extraneous - \u201cthe best thing it could do was alert as to situations that may arise. But then again, people already do that\u201d. As the creator of the more nascent Silverback project, P11 also thought that his community - \u201cshould never need it, other than catching slip-ups\u201d because \u201cif we\u2019re relying on the bot to solve moderation problems, we\u2019ve gone so far off course\u201d.\n\nFortunately, the future for bot adoption is not entirely dismal. P8 expressed appreciation for the depersonalized nature of automated interventions, suggesting that it may have a place in initiating interpersonal interactions such as mediations \u201cit\u2019s nice that the tool depersonalizes the intervention\u201d. But like bots on any platform, user abuse is a possibility: \u201cas soon as a repo has that system on it . . . a bunch of people are going to go brigade it and just drop every offensive word they can think . . . just to see how much they can respond\u201d (P3).\n| Self moderation | Volunteer Moderators | Moderation Teams |\n|-----------------|----------------------|------------------|\n| **Punitive** | Current bot interven-tions are suitable for reactive self-moderation, but customizations, contextual sensitivity and higher accuracy can increase usage. | Current volunteer moderators often experience false positives triggers from moderation bots, customized notifications, increased accuracy and contextual sensitivity can encourage adoption. | Bot interventions can help improve the efficiency of content moderation in large projects with moderation teams. There are opportunities for bots to help team members to make collaborative decisions or onboard new members. |\n| **Mediations** | N/A (Conflicts involving mediation have usually escalated beyond self-moderation.) | Bots can ask for clarifications on behalf of a contributor (acting as a mediator) in place of moderators. | Bots can help de-personalize mediations but there exists room for improvement in detecting situations that are in need of mediations in large projects. |\n| **Preventative**| Inhibitory: There are opportunities for detecting instances of potential toxicity such as indirect hostility that could develop into more serious conflicts (e.g. passive aggressiveness, inside jokes, minor transgressions). Proactive: Bots can provide suggestions of improved workflows after observing repeated mistakes and unconformity to existing standards. | |\n| **Reformative** | Bots can help enforce template use and surface rules, community guidelines and codes of conduct to writers when they are composing a potentially harmful comment. | |\n\nTable 3. Design Recommendations: how automation may support moderation structures and strategies\n\nAdditionally, our participants considered scenarios where moderation bots can be leveraged to execute some of the Moderation Strategies \u2013 Table 3 overviews some ways that bots can support moderation in the future. For situations needing immediate response such as warnings that are administered through Punitive Strategies, P12 recalled an instance of a demanding user where a bot could have intervened. The user had commented: \u201cTHIS BUG HAS BEEN OPEN FOR A YEAR, WHERE IS THE FIX AT\u201d in all caps \u2013 I can definitely see using it [a moderation bot] for something like that.\u201d P4 also contemplated a situation where the Sentiment Bot could have taken the frontline, reactive work of moderation: \u201cif it was able to take a lot of those first conversations I think that will be very useful\u201d. P8 imagined that such a tool could help self-moderation by alerting well-intentioned commenters when they accidentally make a mistake: \u201cThis is gonna be good for people who are good faith commenters, it\u2019s not gonna be effective for the trolls\u201d. In communities where malpractices were pervasive, P14 imagined that moderation bots could help facilitate reform: \u201cIf I see this type of behavior becoming a bad practice in the team/overall community, I would definitely consider doing [adopting] something like that\u201d.\n\n5 DISCUSSION\n\nThrough our examination of moderation norms and practices among communities of various sizes, we found a diverse set of structures and practices that maintainers leverage to manage and\nprevent conflicts. While self-moderation and volunteer-based moderation have pervaded and been well-studied in neighboring communities such as Wikipedia [40] and Stack Overflow [25], we found that moderation required a different set of strategies and in the case of larger projects, more formal structures such as moderation teams. We also discovered that there are still many gaps in the forms of moderation assistance that bots can offer, both in terms of whom they serve and the type of moderation strategies they automate. Inspired by some speculations of our participants, we present below a comparison between the moderation structures, strategies and opportunities for automation in open source versus other platforms, as well as some design recommendations to help guide the future of automation tools for moderation.\n\n5.1 Moderation in open source versus other platforms\n\n5.1.1 Moderated Content. In terms of content, prior works on content moderation in social media largely documented the presence of more explicit forms of misbehaviors such as the infamous triad of \u201cflaming, spamming and virtual rape,\u201d among other forms of inappropriate content such as hate speech, insults or harassment [24, 56, 58, 94]. In our discussions with practitioners on GitHub, we gathered that moderators also watched out for more borderline actions such as technical disagreements or resistance to new norms, which may not be as immediately apparent. The evidence of such subtle forms of disputes mean that moderators are more likely to leverage Mediations as an approach to conflicts between contributors. Another implication is that automated tools powered by language models are unlikely to detect these less obvious misbehaviors, because not only are they subtle, but they also tend to be situational and technical \u2013 and therefore highly context-dependent.\n\n5.1.2 Structures and Roles. While prior works discuss the potentials of self-moderation in community-based platforms such as Facebook groups, Wikipedia and Reddit [17, 90], most considered moderation to be a community-level effort where platform peers helped one another moderate, similar to the volunteer moderation that we attribute to community members in this study. Among our participants, self-moderation was considered an individually-initiated action where contributors self-monitor and edit their own content. Such behaviors are likely to benefit from automated assistance, as presented in Table 3. Our terminology was consistent with one other study on YouTube [72], while another investigation of subreddits called the phenomenon self-censorship [46].\n\nMany of the communities that practice the community-level self-moderation rely on volunteers to conduct moderation, as opposed to more centralized models of corporate moderation [58, 94]. However, past work suggest that the reliance online volunteers to conduct moderation labor may be exploitative, meriting re-examinations from an ethical perspective [70]. Our results revealed that moderators in OSS shared governing powers with higher-up authorities such as the TSC, as well as with the community members more broadly. Prior work suggest that these mechanisms for distributing power across multiple hierarchical levels is beneficial and expected for larger projects, by arguing that 1.) power limitations on moderators can increase the perceived legitimacy of their decisions [2], and 2.) the growth of communities increase the decentralization of moderation on platforms such as Wikipedia [40]. The establishment of formal structures (such as the moderation teams we introduced in section 4.2.2) have been found to improve the communication of norms to newcomers [40], perhaps by increasing the usage of actions such as Reformative Strategies.\n\n5.1.3 Moderation Strategies. Punitive actions such as hiding or deletion of content as well as the banning and calling out of rule-breaking behaviors resemble much of the organizing actions found on Reddit, Discord and Twitter [24, 56, 58], and we found evidence that such strategies are transferable to the OSS context. Similarly, inhibitory warnings used for preventing conflict resemble norm-setting practices adopted by moderators in Wikipedia, Facebook, Twitch, as well as Reddit [39, 94]. The transition from inhibitory warnings to punitive actions reflects Ostrom\u2019s\nfifth design principle of graduated sanctions [83]; and though our participants did not explicitly discuss such escalations, we encourage future work to more closely examine its prevalence in OSS contexts. Ferreira et. al. [36] advocated for both proactive and reactive (or punitive) approaches for addressing known issues and conducting damage control, and our findings provide evidence of moderators employing such strategies in practice. Finally, mediation was a strategy that was almost never observed among extant literature, except in Wikipedia, perhaps of its similarities with open source as a collaborative peer-production platform [12].\n\n5.1.4 Usage and Perception of Automation. Prior works on Wikipedia have found semi- and full-automated tools valuable in providing moderators with an information infrastructure that connected editors from a decentralized network that facilitated valuation, negotiation, and administration, thereby enabling new moderating actions independent of existing norms [44]. For open source, Ferreira et. al. anticipated the deployment of similarly automated assistance for moderation [36], yet many others critiqued that existing toxicity detectors are not yet tailored enough for the software engineering context [10, 36, 60], which our findings corroborated. Beyond the challenges induced by limited domain adaptation, we additionally uncovered the presence of subtle misbehaviors that may contribute to the inability of such models to anticipate more nuanced situations. Lastly, we also highlighted how the absence of customization options caused maintainers to resist adoption, the implementation of which is made difficult by the lack of transparency in the underlying black box models [42, 62].\n\n5.2 Design Implications for Automating Moderation\n\nPunitive strategy. The first set of strategies we uncovered were employed at the early stages of conflict, these included punitive measures that halted the escalation and removed toxic content. Presently, we found that moderation bots are of most assistance to human contributors in this reactive capacity, i.e., by pointing out cases of rule violations and harmful content so that authors can become aware when they unintentionally compose inappropriate content. However, when bots were tasked with calling out bad behaviors, our participants observed that they were prone to hypersensitivity, causing false positive triggers. Such false alarms do not scale well and negatively impact moderators by contributing to their already overloaded maintenance burdens [43, 108], making existing tools helpful only for cases of self-moderation. To extend the scope of reactive support toward volunteer moderators and formal moderation teams, further improvements of contextual sensitivity in the underlying sentiment models supporting current moderation bots are needed to enhance understanding of nuances of language used in software engineering to increase accuracy. More customization options can also be incorporated into these tools to increase transparency, explainability and trust among the community.\n\nMediation Strategy. Once conflicts developed, moderators engaged in different approaches to mitigate and resolve issues, depending on specifics of the situation. When encountering disagreements among small parties of contributors, moderators took mediating actions to reconcile differences. During mediations, moderation bots help facilitate depersonalized interventions between contributors, but further advancements can help moderators detect any disputes that require mediation and ask clarifications from a fellow contributor when one side is uncertain about the presence of conflict or the potentially negative connotations of a comment.\n\nPreventative Strategy. When contributors engage in more indirect forms of toxicity, such as passive aggressiveness or inappropriate jokes, maintainers leveraged inhibitory preventions to limit the extent of bad behaviors. Bots can support moderators by expanding their detection scope to include such forms of indirect hostility. After repeated instances of behavioral mistakes occur among\ndifferent contributors, moderators proactively established new rules and standards to prevent future violations. To contribute toward proactive preventions, bots can help monitor and detect repeated offenses, identify associated workflows that cause such unconformity, and suggest improvements based on practices observed among other communities.\n\n**Reformative Strategy.** For mistakes repeated by multiple contributors, moderators took reformative approaches to set up standards and proactively prevented future cases of similar violations by introducing new rules and workflows. To help moderators initiate reformation among the community, automation can be utilized to surface existing guidelines in real time when authors are writing content so as to prevent the public posting of potentially harmful content.\n\n### 5.3 Relation between Workflow Automations and User Frustration\n\nWhile our study did not set out to find out the types of technical and interpersonal conflicts that lead to toxic and uncivil behaviors in open source, there was an emergent theme that pointed to entitlement and user frustrations, especially among more prominent projects with larger user bases, highlighting the shortage of technical support for users and contributors. These problems usually surfaced when participants discussed the types of strategies or workflows they used or set up (mostly described in Sections 4.3.3 and 4.4.1), many of which were established to reduce the masses of questions or requests. While prior work has touched upon how the time-sensitive and never-ending chore of user support is one of the most emotionally draining tasks for maintainers [43, 100], little is known about the types of technical complexities and misunderstandings that cause such extensive amounts of frustration. Future work can seek to address this missing link between specific forms of technical (or interpersonal) issues that cause these emotionally-charged conflicts as well as some of the mitigation strategies that maintainers mentioned to us above.\n\n### 5.4 Limitations\n\nOur results indicated three main themes around moderation and the potentials of automation in open source, which we presented in the previous sections. However, despite our efforts to recruit from a diverse group of participants and projects for our interviews (with a particular focus on a variety of project sizes), we do not claim that it is representative of all open source developers. The number of interviews we conducted and the snowball sampling technique both limit the representativeness of our sample. We also focused solely on projects hosted on GitHub, which means that the scope of our theory and results may not generalize to other social coding platforms, such as Bitbucket or GitLab. Furthermore, while it would have been ideal to highlight the experiences and perspectives of more marginalized and underrepresented groups in open source, the scarce availability of such participants did not present us the opportunity \u2013 we encourage future work to explore this gap in our understanding of moderators in OSS.\n\n### 6 CONCLUSION\n\nIn this paper, we examined moderation practices in open source communities by conducting 14 semi-structured interviews with moderators and maintainers. Specifically, we characterized the norms, roles and practices of who performs moderation and how different techniques are employed for various contexts (RQ 1). We further investigated automation tools for moderation and identified concerns against adoption, as well as potential ways that future bots can support different groups of moderators in various capacities. Based on the implications of these results, we presented a set of design recommendations for practitioners and researchers, which can guide the future development of automation tools for moderation.\nACKNOWLEDGMENTS\n\nThis work was supported by the National Science Foundation (NSF) under Award No. 1939606, 2001851, 2000782, 1952085 and 1952085. We are grateful to Allen Yao, Pranav Khadpe, Jim Herbsleb, Christian Kastner, David Widder as well as anonymous reviewers for their crucial input and feedback towards the initial and subsequent drafts of this work. Finally, we would like to thank our participants for offering us their time to share their expertise and insights.\n\nREFERENCES\n\n[1] 2011. https://code.dblock.org/2011/07/14/github-is-your-new-resume.html.\n[2] 2012. Regulating behavior in online communities. In Building Successful Online Communities. The MIT Press.\n[3] 2021. Safe space - Github action. https://github.com/charliegerard/safe-space.\n[4] 2021. sentiment-bot. https://github.com/behaviorbot/sentiment-bot.\n[5] 2022. https://github.com/search?q=type:user&type=Users.\n[6] 2022. Adding a code of conduct to your project. https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-code-of-conduct-to-your-project.\n[7] 2022. GitHub Acceptable Use Policies. https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies.\n[8] 2022. Moderating comments and conversations. https://docs.github.com/en/communities/moderating-comments-and-conversations.\n[9] n.d. Wrangling Web Contributions: How to Build a CONTRIBUTING.md. https://mozillascience.github.io/working-open-workshop/contributing/.\n[10] Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, and Shahram Rahimi. 2017. SentiCR: A customized sentiment analysis tool for code review interactions. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 106\u2013111.\n[11] K D Singh Arneja. 2015. Code reviews do not have to be stressful. https://medium.com/idyllic-geeks/code-reviews-do-not-have-to-be-stressful-919e0a8377a1. Accessed: 2022-7-13.\n[12] Matt Billings and Leon A Watts. 2010. Understanding dispute resolution online: using text to reflect personal and substantive issues in conflict. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI \u201910). Association for Computing Machinery, New York, NY, USA, 1447\u20131456.\n[13] Christian Bird. 2011. Sociotechnical coordination and collaboration in open source software. In 2011 27th IEEE International Conference on Software Maintenance (ICSM). 568\u2013573.\n[14] Christian Bird, David Pattison, Raissa D\u2019Souza, Vladimir Filkov, and Premkumar Devanbu. 2008. Latent social structure in open source projects. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering. 24\u201335.\n[15] C\u00e1ssio Castaldi Araujo Blaz and Karin Becker. 2016. Sentiment analysis in tickets for IT support. In Proceedings of the 13th International Conference on Mining Software Repositories (Austin, Texas) (MSR \u201916). Association for Computing Machinery, New York, NY, USA, 235\u2013246.\n[16] Amiangshu Bosu and Jeffrey C Carver. 2013. Impact of Peer Code Review on Peer Impression Formation: A Survey. In 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement. 133\u2013142.\n[17] LIA BOZARTH, JANE IM, CHRISTOPHER QUARLES, and CEREN BUDAK. 2023. Wisdom of Two Crowds: Misinformation Moderation on Reddit and How to Improve this Process\u2014A Case Study of COVID-19. (2023).\n[18] Fabio Calefato, Filippo Lanubile, Federico Maiorano, and Nicole Novielli. 2018. Sentiment Polarity Detection for Software Development. Empirical Software Engineering 23, 3 (June 2018), 1352\u20131382.\n[19] Stevie Chancellor, Andrea Hu, and Munmun De Choudhury. 2018. Norms Matter: Contrasting Social Support Around Behavior Change in Online Weight Loss Communities. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI \u201918, Paper 666). Association for Computing Machinery, New York, NY, USA, 1\u201314.\n[20] Stevie Chancellor, Yannis Kalantidis, Jessica A Pater, Munmun De Choudhury, and David A Shamma. 2017. Multimodal Classification of Moderated Online Pro-Eating Disorder Content. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI \u201917). Association for Computing Machinery, New York, NY, USA, 3213\u20133226.\n[21] Eshwar Chandrasekharan, Chaitrali Gandhi, Matthew Wortley Mustelier, and Eric Gilbert. 2019. Crossmod: A Cross-Community Learning-based System to Assist Reddit Moderators. Proc. ACM Hum.-Comput. Interact. 3, CSCW (Nov. 2019), 1\u201330.\n[22] Eshwar Chandrasekharan, Shagun Jhaver, Amy Bruckman, and Eric Gilbert. 2022. Quarantined! Examining the Effects of a Community-Wide Moderation Intervention on Reddit. , 26 pages.\n\n[23] Eshwar Chandrasekharan, Umashanthi Pavalanathan, Anirudh Srinivasan, Adam Glynn, Jacob Eisenstein, and Eric Gilbert. 2017. You Can\u2019t Stay Here: The Efficacy of Reddit\u2019s 2015 Ban Examined Through Hate Speech. Proc. ACM Hum.-Comput. Interact. 1, CSCW (Dec. 2017), 1\u201322. https://doi.org/10.1145/3134666\n\n[24] Eshwar Chandrasekharan, Umashanthi Pavalanathan, Anirudh Srinivasan, Adam Glynn, Jacob Eisenstein, and Eric Gilbert. 2017. You Can\u2019t Stay Here: The Efficacy of Reddit\u2019s 2015 Ban Examined Through Hate Speech. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (2017), 1\u201322. https://doi.org/10.1145/3134666\n\n[25] Jithin Cheriyan, Bastin Tony Roy Savarimuthu, and Stephen Cranefield. 2020. Norm violation in online communities \u2013 A study of Stack Overflow comments. (April 2020). arXiv:2004.05589 [cs.SI]\n\n[26] Sophie Cohen. 2021. Contextualizing toxicity in open source: a qualitative study. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA, 1669\u20131671.\n\n[27] Gabriella Coleman. 2009. CODE IS SPEECH: Legal tinkering, expertise, and protest among free and open source software developers. Cult. Anthropol. 24, 3 (Aug. 2009), 420\u2013454.\n\n[28] John W Creswell and Cheryl N Poth. 2016. Qualitative inquiry and research design: Choosing among five approaches. Sage publications.\n\n[29] Laura Dabbish, Colleen Stuart, Jason Tsay, and James Herbsleb. 2012. Leveraging transparency. IEEE software 30, 1 (2012), 37\u201343.\n\n[30] Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social coding in GitHub: transparency and collaboration in an open software repository. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work (Seattle, Washington, USA) (CSCW \u201912). Association for Computing Machinery, New York, NY, USA, 1277\u20131286.\n\n[31] Erik Dietrich. 2020. How to Deal with an Insufferable Code Reviewer. Retrieved September (2020).\n\n[32] Carolyn D Egelman, Emerson Murphy-Hill, Elizabeth Kammer, Margaret Morrow Hodges, Collin Green, Ciera Jaspan, and James Lin. 2020. Predicting Developers\u2019 Negative Feelings about Code Review. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). 174\u2013185.\n\n[33] Nadia Eghbal. 2016. Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure. Ford Foundation.\n\n[34] Nadia Eghbal. 2020. Working in public: the making and maintenance of open source software. Stripe Press.\n\n[35] Linda Erlenhov, Francisco Gomes de Oliveira Neto, and Philipp Leitner. 2020. An empirical study of bots in software development: characteristics and challenges from a practitioner\u2019s perspective.\n\n[36] Isabella Ferreira, Jinghui Cheng, and Bram Adams. 2021. The \u201cShut the f**k up\u201d Phenomenon: Characterizing Incivility in Open Source Code Review Discussions. , 35 pages.\n\n[37] Isabella Ferreira, Ahlaam Rafiq, and Jinghui Cheng. 2022. Incivility Detection in Open Source Code Review and Issue Discussions. (June 2022). arXiv:2206.13429 [cs.SE]\n\n[38] Anna Filippova and Hichang Cho. 2015. Mudslinging and Manners: Unpacking Conflict in Free and Open Source Software. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (Vancouver, BC, Canada) (CSCW \u201915). Association for Computing Machinery, New York, NY, USA, 1393\u20131403.\n\n[39] Andrea Forte and Amy Bruckman. 2008. Scaling consensus: Increasing decentralization in Wikipedia governance. In Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008). IEEE, 157\u2013157.\n\n[40] Andrea Forte, Vanesa Larco, and Amy Bruckman. 2009. Decentralization in Wikipedia Governance. Journal of Management Information Systems 26, 1 (July 2009), 49\u201372.\n\n[41] Daviti Gachechiladze, Filippo Lanubile, Nicole Novielli, and Alexander Serebrenik. 2017. Anger and Its Direction in Collaborative Software Development. In 2017 IEEE/ACM 39th International Conference on Software Engineering: New Ideas and Emerging Technologies Results Track (ICSE-NIER). 11\u201314.\n\n[42] R Stuart Geiger and Aaron Halfaker. 2016. Open algorithmic systems: lessons on opening the black box from Wikipedia. AoIR Selected Papers of Internet Research (2016).\n\n[43] R Stuart Geiger, Dorothy Howard, and Lilly Irani. 2021. The Labor of Maintaining and Scaling Free and Open-Source Software Projects. Proc. ACM Hum.-Comput. Interact. 5, CSCW1 (April 2021), 1\u201328.\n\n[44] R Stuart Geiger and David Ribes. 2010. The work of sustaining order in wikipedia: the banning of a vandal. In Proceedings of the 2010 ACM conference on Computer supported cooperative work (Savannah, Georgia, USA) (CSCW \u201910). Association for Computing Machinery, New York, NY, USA, 117\u2013126.\n\n[45] R Stuart Geiger, Nelle Varoquaux, Charlotte Mazel-Cabasse, and Chris Holdgraf. 2018. The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries. Comput. Support. Coop. Work 27, 3 (Dec. 2018), 767\u2013802.\n\n[46] Anna Gibson. 2019. Free Speech and Safe Spaces: How Moderation Policies Shape Online Discussion Spaces. Social Media + Society 5, 1 (Jan. 2019), 2056305119832588.\n[47] Joanne E Gray and Nicolas P Suzor. 2020. Playing with machines: Using machine learning to understand automated copyright enforcement at scale. *Big Data & Society* 7, 1 (Jan. 2020), 2053951720919963.\n\n[48] James Grimmelmann. 2015. The virtues of moderation. *Yale JL & Tech.* 17 (2015), 42.\n\n[49] Ted Grover and Gloria Mark. 2019. Detecting Potential Warning Behaviors of Ideological Radicalization in an Alt-Right Subreddit. *ICWSM* 13 (July 2019), 193\u2013204.\n\n[50] Emitza Guzman, David Az\u00f3car, and Yang Li. 2014. Sentiment analysis of commit comments in GitHub: an empirical study. In *Proceedings of the 11th Working Conference on Mining Software Repositories* (Hyderabad, India) (MSR 2014). Association for Computing Machinery, New York, NY, USA, 352\u2013355.\n\n[51] Emitza Guzman and Bernd Bruegge. 2013. Towards emotional awareness in software development teams. In *Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering* (Saint Petersburg, Russia) (ESEC/FSE 2013). Association for Computing Machinery, New York, NY, USA, 671\u2013674.\n\n[52] Wenjian Huang, Tun Lu, Haiyi Zhu, Guo Li, and Ning Gu. 2016. Effectiveness of Conflict Management Strategies in Peer Review Process of Online Collaboration Projects. In *Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing* (San Francisco, California, USA) (CSCW \u201916). Association for Computing Machinery, New York, NY, USA, 717\u2013728.\n\n[53] Shagun Jhaver, Darren Scott Appling, Eric Gilbert, and Amy Bruckman. 2019. Did you suspect the post would be removed? *Proc. ACM Hum.-Comput. Interact.* 3, CSCW (Nov. 2019), 1\u201333.\n\n[54] Shagun Jhaver, Iris Birman, Eric Gilbert, and Amy Bruckman. 2019. Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator. *ACM Trans. Comput.-Hum. Interact.* 26, 5 (July 2019), 1\u201335.\n\n[55] Shagun Jhaver, Amy Bruckman, and Eric Gilbert. 2019. Does Transparency in Moderation Really Matter? User Behavior After Content Removal Explanations on Reddit. *Proc. ACM Hum.-Comput. Interact.* 3, CSCW (Nov. 2019), 1\u201327.\n\n[56] Shagun Jhaver, Sucheta Ghoshal, Amy Bruckman, and Eric Gilbert. 2018. Online Harassment and Content Moderation: The Case of Blocklists. *ACM Trans. Comput.-Hum. Interact.* 25, 2 (March 2018), 1\u201333.\n\n[57] Shagun Jhaver, Sucheta Ghoshal, Amy Bruckman, and Eric Gilbert. 2018. Online harassment and content moderation: The case of blocklists. *ACM Transactions on Computer-Human Interaction (TOCHI)* 25, 2 (2018), 1\u201333.\n\n[58] Jialun Aaron Jiang, Charles Kiene, Skyler Middler, Jed R Brubaker, and Casey Fiesler. 2019. Moderation Challenges in Voice-based Online Communities on Discord. *Proc. ACM Hum.-Comput. Interact.* 3, CSCW (Nov. 2019), 1\u201323.\n\n[59] Jialun Aaron Jiang, Peipei Nie, Jed R Brubaker, and Casey Fiesler. 2022. A Trade-off-centered Framework of Content Moderation. (June 2022). arXiv:2206.03450 [cs.HC]\n\n[60] Robbert Jongeling, Subhajit Datta, and Alexander Serebrenik. 2015. Choosing Your Weapons: On Sentiment Analysis Tools for Software Engineering Research. *2015 IEEE International Conference on Software Maintenance and Evolution (ICSME)* (2015), 531\u2013535. https://doi.org/10.1109/icsm.2015.7332508\n\n[61] Robbert Jongeling, Proshanta Sarkar, Subhajit Datta, and Alexander Serebrenik. 2017. On negative results when using sentiment analysis tools for software engineering research. *Empirical Software Engineering* 22, 5 (Oct. 2017), 2543\u20132584.\n\n[62] Prerna Juneja, Deepika Rama Subramanian, and Tanushree Mitra. 2020. Through the Looking Glass: Study of Transparency in Reddit\u2019s Moderation Practices. *Proc. ACM Hum.-Comput. Interact.* 4, GROUP (Jan. 2020), 1\u201335.\n\n[63] Rajdeep Kaur and Kuljit Kaur. 2022. Insights into Developers\u2019 Abandonment in FLOSS Projects. , 731\u2013740 pages.\n\n[64] Terhi Kilamo, Valentina Lenarduzzi, Tuukka Ahoniemi, Ari Jaaksi, Jurka Rahikkala, and Tommi Mikkonen. 2020. How the Cathedral Embraced the Bazaar, and the Bazaar Became a Cathedral. In *Open Source Systems*. Springer International Publishing, 141\u2013147.\n\n[65] Karim R Lakhani and Eric von Hippel. 2004. How open source software works: \u201cfree\u201d user-to-user assistance. In *Produktentwicklung mit virtuellen Communities*. Springer, 303\u2013339.\n\n[66] Cliff Lampe and Paul Resnick. 2004. Slash(dot) and burn: distributed moderation in a large online conversation space. In *Proceedings of the SIGCHI Conference on Human Factors in Computing Systems* (Vienna, Austria) (CHI \u201904). Association for Computing Machinery, New York, NY, USA, 543\u2013550.\n\n[67] Noam Lapidot-Lefler and Azy Barak. 2012. Effects of anonymity, invisibility, and lack of eye-contact on toxic online disinhibition. *Comput. Human Behav.* 28, 2 (March 2012), 434\u2013443.\n\n[68] Nolan Lawson. 2017. What it feels like to be an open-source maintainer. *Read the Tea Leaves*. https://nolanlawson.com/2017/03/05/what-it-feels-like-to-be-an-open-source-maintainer (2017).\n\n[69] Charlotte P Lee, Paul Dourish, and Gloria Mark. 2006. The human infrastructure of cyberinfrastructure. In *Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work* (Banff, Alberta, Canada) (CSCW \u201906). Association for Computing Machinery, New York, NY, USA, 483\u2013492.\n\n[70] Hanlin Li, Leah Ajmani, Moyan Zhou, Nicholas Vincent, Sohyeon Hwang, Tiziano Piccardi, Sneha Narayan, Sherae Daniel, and Veniamin Veselovsky. 2022. Ethical Tensions, Norms, and Directions in the Extraction of Online Volunteer Work. In *Companion Publication of the 2022 Conference on Computer Supported Cooperative Work and Social Computing*. Proc. ACM Hum.-Comput. Interact., Vol. 7, No. CSCW2, Article 301. Publication date: October 2023.\n[71] Renee Li, Pavitthra Pandurangan, Hana Frluckaj, and Laura Dabbish. 2021. Code of Conduct Conversations in Open Source Software Projects on Github. *Proc. ACM Hum.-Comput. Interact.* 5, CSCW1 (April 2021), 1\u201331.\n\n[72] Renkai Ma and Yubo Kou. 2021. \"How advertiser-friendly is my video?\": YouTuber\u2019s Socioeconomic Interactions with Algorithmic Content Moderation. *Proceedings of the ACM on Human-Computer Interaction* 5, CSCW2 (2021), 1\u201325.\n\n[73] Pia Mancini et al. 2017. Sustain: A one day conversation for open source software sustainers\u2013the report. In *Technical report, Sustain Conference Organization*.\n\n[74] Gerardo Matturro. 2013. Soft skills in software engineering: A study of its demand by software companies in Uruguay. In *2013 6th international workshop on cooperative and human aspects of software engineering (CHASE)*. IEEE, 133\u2013136.\n\n[75] Courtney Miller, Sophie Cohen, Daniel Klug, Bogdan Vasilescu, and Christian K\u00e4stner. 2022. \u201cDid You Miss My Comment or What?\u201d Understanding Toxicity in Open Source Discussions. In *44th International Conference on Software Engineering (ICSE\u201922)*.\n\n[76] Courtney Miller, David Gray Widder, Christian K\u00e4stner, and Bogdan Vasilescu. 2019. Why Do People Give Up FLOSSing? A Study of Contributor Disengagement in Open Source. In *Open Source Systems*. Springer International Publishing, 116\u2013129.\n\n[77] BUCUREAN Mirela. 2019. A QUALITATIVE STUDY ON PASSIVE-AGGRESSIVE BEHAVIOUR AT WORKPLACE. *Annals of the University of Oradea, Economic Science Series* 28, 2 (2019).\n\n[78] Alessandro Murgia, Parastou Tourani, Bram Adams, and Marco Ortu. 2014. Do developers feel emotions? an exploratory analysis of emotions in software artifacts. In *Proceedings of the 11th Working Conference on Mining Software Repositories* (Hyderabad, India) (MSR 2014). Association for Computing Machinery, New York, NY, USA, 262\u2013271.\n\n[79] Sarah Myers West. 2018. Censored, suspended, shadowbanned: User interpretations of content moderation on social media platforms. *New Media & Society* 20, 11 (Nov. 2018), 4366\u20134383.\n\n[80] Nachiappan Nagappan, Brendan Murphy, and Victor Basili. 2008. The influence of organizational structure on software quality: an empirical case study. In *Proceedings of the 30th international conference on Software engineering*. 521\u2013530.\n\n[81] Ray Oldenburg. 1999. *The great good place: Cafes, coffee shops, bookstores, bars, hair salons, and other hangouts at the heart of a community*. Da Capo Press.\n\n[82] Siobh\u00e1n O\u2019Mahony and Fabrizio Ferraro. 2007. The Emergence of Governance in an Open Source Community. *AMJ* 50, 5 (Oct. 2007), 1079\u20131106.\n\n[83] Elinor Ostrom. 2000. Collective action and the evolution of social norms. *Journal of economic perspectives* 14, 3 (2000), 137\u2013158.\n\n[84] Christine Porath and Christine Pearson. 2013. The price of incivility. *Harv. Bus. Rev.* 91, 1-2 (Jan. 2013), 114\u201321, 146.\n\n[85] Huilian Sophie Qiu, Yucen Lily Li, Susmita Padala, Anita Sarma, and Bogdan Vasilescu. 2019. The Signals that Potential Contributors Look for When Choosing Open-source Projects. *Proc. ACM Hum.-Comput. Interact.* 3, CSCW (Nov. 2019), 1\u201329.\n\n[86] Naveen Raman, Minxuan Cao, Yulia Tsvetkov, Christian K\u00e4stner, and Bogdan Vasilescu. 2020. Stress and burnout in open source: toward finding, understanding, and mitigating unhealthy interactions. In *Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results* (Seoul, South Korea) (ICSE-NIER \u201920). Association for Computing Machinery, New York, NY, USA, 57\u201360.\n\n[87] Philipp Ranzhin. 2020. I ruin developers\u2019 lives with my code reviews and I\u2019m sorry. Retrieved September (2020).\n\n[88] David Ribes, Steven Jackson, Stuart Geiger, Matthew Burton, and Thomas Finholt. 2013. Artifacts that organize: Delegation in the distributed organization. *Information and Organization* 23, 1 (Jan. 2013), 1\u201314.\n\n[89] Jaydeb Sarker, Asif Kamal Turzo, and Amiangshu Bosu. 2020. A Benchmark Study of the Contemporary Toxicity Detectors on Software Engineering Interactions.\n\n[90] Joseph Seering. 2020. Reconsidering Self-Moderation. *Proceedings of the ACM on Human-Computer Interaction* 4, CSCW2 (2020), 1\u201328. https://doi.org/10.1145/3415178\n\n[91] Joseph Seering. 2020. Reconsidering Self-Moderation: the Role of Research in Supporting Community-Based Models for Online Content Moderation. *Proc. ACM Hum.-Comput. Interact.* 4, CSCW2 (Oct. 2020), 1\u201328.\n\n[92] Joseph Seering, Robert Kraut, and Laura Dabbish. 2017. Shaping Pro and Anti-Social Behavior on Twitch Through Moderation and Example-Setting. In *Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing* (Portland, Oregon, USA) (CSCW \u201917). Association for Computing Machinery, New York, NY, USA, 111\u2013125.\n\n[93] Joseph Seering, Tony Wang, Jina Yoon, and Geoff Kaufman. 2019. Moderator engagement and community development in the age of algorithms. *New Media & Society* 21, 7 (July 2019), 1417\u20131443. https://doi.org/10.1177/1461444818821316\n\n[94] Joseph Seering, Tony Wang, Jina Yoon, and Geoff Kaufman. 2019. Moderator engagement and community development in the age of algorithms. *New Media & Society* 21, 7 (2019), 1417\u20131443.\n[95] Giuseppe Silvestri, Jie Yang, Alessandro Bozzon, and Andrea Tagarelli. 2015. Linking Accounts across Social Networks: the Case of StackOverflow, Github and Twitter. In *KDWeb*. 41\u201352.\n\n[96] C Estelle Smith, Bowen Yu, Anjali Srivastava, Aaron Halfaker, Loren Terveen, and Haiyi Zhu. 2020. Keeping Community in the Loop: Understanding Wikipedia Stakeholder Values for Machine Learning-Based Systems. In *Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems*. Association for Computing Machinery, New York, NY, USA, 1\u201314.\n\n[97] Megan Squire and Rebecca Gazda. 2015. FLOSS as a Source for Profanity and Insults: Collecting the Data. In *2015 48th Hawaii International Conference on System Sciences*. 5290\u20135298.\n\n[98] Miriah Steiger, Timir J Bharucha, Sukrit Venkatagiri, Martin J Riedl, and Matthew Lease. 2021. The Psychological Well-Being of Content Moderators: The Emotional Labor of Commercial Moderation and Avenues for Improving Support. In *Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems* (Yokohama, Japan) (*CHI \u201921, Article 341*). Association for Computing Machinery, New York, NY, USA, 1\u201314.\n\n[99] John Suler. 2004. The online disinhibition effect. *Cyberpsychol. Behav.* 7, 3 (June 2004), 321\u2013326.\n\n[100] Jason Swarts. 2019. Open-Source Software in the Sciences: The Challenge of User Support. *Journal of Business and Technical Communication* 33, 1 (Jan. 2019), 60\u201390.\n\n[101] Damian A Tamburri, Patricia Lago, and Hans van Vliet. 2013. Organizational social structures for software engineering. *ACM Computing Surveys (CSUR)* 46, 1 (2013), 1\u201335.\n\n[102] Xin Tan and Minghui Zhou. 2019. How to Communicate when Submitting Patches: An Empirical Study of the Linux Kernel. *Proc. ACM Hum.-Comput. Interact.* 3, CSCW (Nov. 2019), 1\u201326.\n\n[103] Bianca Trinkenreich, Igor Wiese, Anita Sarma, Marco Gerosa, and Igor Steinmacher. 2022. Women\u2019s participation in open source software: a survey of the literature. *ACM Transactions on Software Engineering and Methodology (TOSEM)* 31, 4 (2022), 1\u201337.\n\n[104] Jason Tsay, Laura Dabbish, and James Herbsleb. 2014. Let\u2019s talk about it: evaluating contributions through discussion in GitHub. In *Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering* (Hong Kong, China) (*FSE 2014*). Association for Computing Machinery, New York, NY, USA, 144\u2013154.\n\n[105] Bogdan Vasilescu, Daryl Posnett, Baishakhi Ray, Mark G J van den Brand, Alexander Serebrenik, Premkumar Devanbu, and Vladimir Filkov. 2015. Gender and Tenure Diversity in GitHub Teams. In *Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems* (Seoul, Republic of Korea) (*CHI \u201915*). Association for Computing Machinery, New York, NY, USA, 3789\u20133798.\n\n[106] Gang Wang, Bolun Wang, Tianyi Wang, Ana Nika, Haitao Zheng, and Ben Y Zhao. 2014. Whispers in the dark: analysis of an anonymous social network. In *Proceedings of the 2014 Conference on Internet Measurement Conference* (Vancouver, BC, Canada) (*IMC \u201914*). Association for Computing Machinery, New York, NY, USA, 137\u2013150.\n\n[107] Mairieli Wessel, Alexander Serebrenik, Igor Wiese, Igor Steinmacher, and Marco A Gerosa. 2020. What to Expect from Code Review Bots on GitHub?\n\n[108] Mairieli Wessel, Igor Wiese, Igor Steinmacher, and Marco Aurelio Gerosa. 2021. Don\u2019t Disturb Me: Challenges of Interacting with Software Bots on Open Source Software Projects. *Proc. ACM Hum.-Comput. Interact.* 5, CSCW2 (Oct. 2021), 1\u201321.\n\n[109] Titus Wormer. 2015. alex: Catch insensitive, inconsiderate writing. https://alexjs.com/. Accessed: 2022-7-14.\n\n[110] Yan Xia, Haiyi Zhu, Tun Lu, Peng Zhang, and Ning Gu. 2020. Exploring Antecedents and Consequences of Toxicity in Online Discussions: A Case Study on Reddit. *Proc. ACM Hum.-Comput. Interact.* 4, CSCW2 (Oct. 2020), 1\u201323.\n\n[111] Gi Woong Yun, Sasha Allgayer, and Sung-Yeon Park. 2020. Mind Your Social Media Manners: Pseudonymity, Imaginary Audience, and Incivility on Facebook vs. YouTube. *Int. J. Commun. Syst.* 14, 0 (June 2020), 21.\n\nReceived January 2023; revised April 2023; accepted July 2023", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/009-hsieh.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 29, "total-input-tokens": 76630, "total-output-tokens": 28407, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 2963, 1], [2963, 7092, 2], [7092, 10646, 3], [10646, 14972, 4], [14972, 19412, 5], [19412, 23429, 6], [23429, 27002, 7], [27002, 30502, 8], [30502, 34022, 9], [34022, 38329, 10], [38329, 42771, 11], [42771, 47238, 12], [47238, 50569, 13], [50569, 54922, 14], [54922, 59589, 15], [59589, 63759, 16], [63759, 67762, 17], [67762, 72129, 18], [72129, 76639, 19], [76639, 80931, 20], [80931, 84180, 21], [84180, 88522, 22], [88522, 92652, 23], [92652, 96479, 24], [96479, 101004, 25], [101004, 106384, 26], [106384, 112046, 27], [112046, 117235, 28], [117235, 121363, 29]]}}
|
|
{"id": "f012865916a78db09c683e3b4913a6792196f780", "text": "The impacts of lockdown on open source software contributions during the COVID-19 pandemic\n\nJin Hu\\textsuperscript{a, b}, Daning Hu\\textsuperscript{b, *}, Xuan Yang\\textsuperscript{c}, Michael Chau\\textsuperscript{a}\n\n\\textsuperscript{a} Faculty of Business and Economics, The University of Hong Kong, Hong Kong\n\\textsuperscript{b} Business School, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China\n\\textsuperscript{c} Department of Informatics, University of Zurich, 8006 Zurich, Switzerland\n\n\\textbf{ARTICLE INFO}\n\n\\textbf{Keywords:}\nCOVID-19\nLockdown\nWork productivity\nOpen source software\nFace-to-face interactions\n\n\\textbf{ABSTRACT}\n\nThe COVID-19 pandemic instigated widespread lockdowns, compelling millions to transition to work-from-home (WFH) arrangements and rely heavily on computer-mediated communications (CMC) for collaboration. This study examines the impacts of lockdown on innovation-driven work productivity, focusing on contributions to open source software (OSS) projects on GitHub, the world's largest OSS platform. By leveraging two lockdowns in China as natural experiments, we discover that developers in the 2021 Xi'an lockdown increased OSS contributions by 9.0\\%, while those in the 2020 Wuhan lockdown reduced their contributions by 10.5\\%. A subsequent survey study elucidates this divergence, uncovering an adaptation effect wherein Xi'an developers became more accustomed to the new norm of WFH over time, capitalizing on the flexibility and opportunities of remote work. Moreover, our findings across both lockdowns reveal that the lack of face-to-face (F2F) interactions significantly impeded OSS contributions, whereas the increased available time at home positively influenced them. This finding is especially noteworthy as it challenges the assumption that CMC can effortlessly substitute for F2F interactions without negatively affecting productivity. We further examine the impacts of stay-at-home orders in the United States (US) on OSS contributions and find no significant effects. Collectively, our research offers valuable insights into the multifaceted impacts of lockdown on productivity, shedding light on how individuals adapt to remote work norms during protracted disruptions like a pandemic. These insights provide various stakeholders, including individuals, organizations, and policymakers, with vital knowledge to prepare for future disruptions, foster sustainable resilience, and adeptly navigate the evolving landscape of remote work in a post-pandemic world.\n\n\\textbf{1. Introduction}\n\nThe COVID-19 pandemic has catalyzed a global transition to work-from-home (WFH) arrangements, as nations implemented lockdown measures to limit human mobility and curb the spread of the virus (Fang et al., 2020; Sheridan et al., 2020; Wang, 2022). This unprecedented shift to remote work, facilitated by a myriad of computer-mediated communications (CMC) technologies, has instigated profound and lasting impacts on work productivity, an area that has garnered significant attention in recent scholarly investigations (Barber et al., 2021; Cui et al., 2022). Understanding such impacts on work productivity is crucial for guiding policy and decision-making at multiple levels. It can help reshape individual approaches to work-life balance, redefine organizational strategies on WFH arrangements, and inform governmental policies or legislation aimed at supporting remote work. Moreover, the significant disruptions brought by the pandemic highlight the imperative for adaptability and resilience at all these levels. Studying the effects of lockdown on work productivity can provide valuable insights, enabling stakeholders to better navigate future upheavals and cultivate enduring resilience. However, the impact of lockdown on work productivity, especially within innovation-driven domains such as open source software (OSS) development, remains largely unexplored.\n\nTo address this research gap, our study leverages the lockdowns implemented in two of the world's largest economies \u2013 the United States (US) and China \u2013 during various stages of the pandemic. These lockdowns serve as natural experiments, enabling us to study their impacts on OSS developers' contributions to GitHub, the world's largest OSS platform (GitHub, 2022b). China's Zero-COVID strategy, marked by its...\nuniform and strict lockdown measures across various cities at different times, provides an ideal setting to study OSS contributors\u2019 responses to lockdowns. More importantly, it allows us to understand their adaptation to the new normal of WFH throughout various pandemic stages. Meanwhile, the US, with its prominent role in the OSS community and extensive data availability, serves as an optimal environment to extend and validate our findings derived from the Chinese lockdowns, thereby enhancing the generalizability of our insights beyond the specific context of China. Taken together, these natural experiments enable us to delve deeper into how different approaches to managing the pandemic influence OSS contributors\u2019 productivity.\n\nOur main difference-in-differences (DID) analysis focused on two lockdowns in China: the initial lockdown in Wuhan in 2020 and another one occurred in Xi\u2019an in 2021. Interestingly, the results revealed a significant positive impact of the 2021 lockdown on the OSS contributions of Xi\u2019an developers, in contrast to the negative impact observed among Wuhan developers during the 2020 lockdown. Moreover, in both lockdowns, the results indicated that developers who made more online comments to their local peers experienced a more pronounced decline in their contributions. To delve deeper into the underlying mechanisms driving these outcomes, we conducted a targeted survey among the developers affected by these two lockdowns.\n\nThe survey findings reveal that Xi\u2019an developers reported significantly fewer interruptions and a marked increase in flexibility in making OSS contributions during the later lockdown in 2021. Factors such as fear related to COVID-19 and increased housework responsibilities, which had significantly reduced Wuhan developers\u2019 contributions during the initial 2020 lockdown, became insignificant for developers during the 2021 Xi\u2019an lockdown. These findings point to a notable adaptation effect, as developers became more accustomed to the new norms of WFH imposed by the COVID-19 pandemic over time. The survey also found that, for both Wuhan and Xi\u2019an developers, the increase in available time positively influenced OSS contributions.\n\nMoreover, our survey study unveiled that, for both Wuhan and Xi\u2019an developers, the lack of face-to-face (F2F) interactions significantly was found to significantly reduce their contribution levels. This finding is further corroborated by another survey discovery that identified a strong positive correlation between developers\u2019 tendency to comment on GitHub and their propensity for F2F interactions prior to the lockdown. Coupled with the aforementioned DID analysis, which demonstrated a more pronounced negative impact on contributions from Wuhan and Xi\u2019an developers who engaged in more online commenting activities with their local collaborators, this evidence leads to the inference that developers who frequently engaged in F2F interactions were more adversely affected by the lockdowns in terms of their contributions. This finding underscores the importance of F2F interactions in collaborative work environments and challenges the assumption that CMC can seamlessly replace F2F interactions without any adverse impact on productivity.\n\nFurthermore, we use the DID analysis to examine the impact of stay-at-home lockdown orders in the US on developers\u2019 OSS contributions. This empirical approach is guided by three key considerations. First, assessing the generalizability of the findings from Chinese lockdowns to other contexts is vital, as the impacts of strict lockdown measures like those in China may differ from the effects of milder restrictions adopted elsewhere. Second, the prominence of OSS development in the US, coupled with the extensive data available on GitHub, makes it an apt context for our analysis. Third, the heterogeneity in policies regarding lockdowns across different US states offers a unique opportunity for comparative analysis. This allows for a nuanced understanding of how diverse approaches to pandemic management can influence OSS contributions.\n\nIn addition, by comparing the effects observed in China and the US, we aim to provide valuable insights into the broader implications of lockdown measures on OSS contributions on a global scale. Interestingly, our analysis revealed no significant impact of US lockdowns on developers\u2019 OSS contributions. We posit that this may be attributable to the less strict nature of stay-at-home orders in the US compared to the lockdown measures enforced in China. The relatively lenient restrictions in the US, which permitted essential activities and work, may not have led to significant disruptions in potential F2F interactions or provided additional available time for developers. Consequently, these factors may have exerted minimal effects on their OSS contributions.\n\nOur contributions are threefold. First, by examining the impact of lockdowns on OSS contributions, our study provides novel insights into the effects of remote work on productivity. The nuanced findings on how individuals adapt to new norms of WFH during prolonged periods of disruption can equip various stakeholders \u2013 including individuals, organizations, and governments \u2013 with essential knowledge. This knowledge can guide preparations for similar future disruptions and build sustainable resilience. Second, our research reveals the detrimental effects of reduced F2F interactions, challenging the assumption that CMC can effortlessly replace F2F interactions without compromising productivity. This is especially salient in innovation-driven domains like OSS development. This insight enriches the discussion on the comparative impacts of CMC and F2F on the efficacy of virtual teams, a discussion that has become increasingly pertinent in an era where reliance on CMC for remote work is likely to persist even beyond the pandemic (Airbnb, 2022; Warren, 2020). Third, our study stands out through the adoption of systematic causal analysis methods. While previous research on the impact of lockdown has mainly relied on survey methods, our use of DID analysis on empirical data from GitHub enables a more robust examination of the causal effects of lockdowns. This methodological approach, reinforced with various robustness tests, not only strengthens the findings of our study but also offers a valuable framework that can be leveraged in future research. This includes exploring the impact of policy interventions or organization strategies in response to similar disruptions.\n\n2. Literature review\n\n2.1. COVID-19 and work productivity\n\nThe COVID-19 pandemic has led to an unprecedented shift to remote work, with millions mandated to work from home due to government-imposed lockdowns. The impact of WFH arrangements, brought about by those lockdowns, on work productivity has been the subject of intensive study, yielding mixed findings. Several studies found that lockdown-induced WFH is associated with declines in productivity, especially in innovation-oriented work such as software development (Ralph et al., 2020) and scholarly research (Barber et al., 2021; Walters et al., 2022). Ralph et al. (2020) surveyed 2225 software developers across 53 countries and found that both their productivity and well-being were diminished due to COVID-19. The primary influencing factors were fear related to the pandemic, disaster preparedness, and home office ergonomics. Barber et al. (2021) surveyed 1008 members of the American Finance Association, with 78.1% of the respondents suggesting that their research productivity is negatively affected by COVID-19. This was due to the lack of traditional F2F communications to disseminate research and obtain feedback, as well as overwhelming health concerns. Another survey study by Walters et al. (2022) investigated the reasons behind the reported decline in research activity among female academics during lockdowns. The primary reason was that while working from home, female academics were burdened with traditional family roles typically assumed by women, as well as increasing teaching and administrative workloads.\n\nOn the other hand, some studies found that productivity in lockdown-induced WFH scenarios has actually increased during this pandemic. Asay (2020) reports that OSS developers consistently increased their work volume in 2020, as they never truly left their work. Cui et al. (2022) found an overall 35% increase in productivity and a 13% increase in the gender gap among social science scholars in the US.\nsince the lockdown began. They suggest that while the lockdown could result in substantial time savings for work-related tasks such as commuting, female researchers may find themselves allocating more time for home-related tasks such as childcare.\n\nAnother line of research suggests that lockdowns in general have little effect on software developers. Forsgren (2020) reports that the activity of GitHub developers in the early days of COVID-19 was similar to or slightly increased compared to the previous year. Neto et al. (2021) surveyed 279 developers of GitHub projects developed using Java and found that WFH during the pandemic did not affect task completion time, code contribution, or quality. Similar studies were conducted to survey developers at major IT companies like Microsoft (Ford et al., 2021) and Baidu (Bao et al., 2022). They found that lockdown generally had little impact on developers\u2019 productivity. However, these developers had differing opinions about the effects of lockdown. Some suggest their productivity benefited from WFH because of fewer disturbances, saved commuting time, and improved work-life balance. Others suggested their productivity suffered from WFH due to increased home-related tasks, decreased collaboration with others, and interruptions from family members.\n\nTo summarize, existing studies on the impact of pandemic-induced lockdowns on work productivity have yielded mixed findings and are heavily reliant on survey methods. Moreover, these studies have not sufficiently explored how knowledge workers, such as developers, adapt to remote work settings and how this adaptation influences their productivity during prolonged periods of lockdown. There is a clear need for systematic causal analyses on large empirical datasets to study the impacts and underlying mechanisms of pandemic-induced lockdowns on innovation-related work considering the effects of adaptation.\n\n2.2. Face-to-face communications and computer-mediated communications\n\nPrevious research (NicCanna et al., 2021; Smite et al., 2023) has highlighted that one of the direct implications of pandemic-induced lockdowns is the diminished opportunity for traditional F2F interactions and an increased reliance on CMC, both of which have been considered crucial in the realm of OSS development (Crowston et al., 2007; O\u2019Mahony and Ferraro, 2007). Crowston et al. (2007) identify several settings in which OSS developers engage in F2F meetings and the benefits they derive from such interactions. For instance, F2F meetings provide OSS developers with great opportunities to socialize, build teams, and verify each other\u2019s identity. They also find that certain OSS development activities are best suited for F2F interactions, such as conveying important news (Boden and Molotch, 1994). Kock (2004) suggests that it is because human beings evolved over many years to excel at F2F interactions. Moreover, O\u2019Mahony and Ferraro (2007) discovered that F2F interactions with OSS community members could increase one\u2019s likelihood of ascending to a community leadership role. This is achieved through 1) building more trusting and reciprocal relationships and 2) creating potential coalitions. Butler and Jaffe (2021) also suggested that F2F interactions can significantly influence one\u2019s efforts in community building.\n\nThese OSS studies are typically conducted in empirical contexts where F2F interactions and CMC co-exist among OSS community members, making it difficult to disentangle their effects. However, the strict lockdown measures in China have presented a unique opportunity to examine developers\u2019 OSS contributions in a setting where F2F interactions are entirely absent. An important conjecture is that OSS developers, who have been accustomed to working productively using CMC in a remote and asynchronous manner for decades (Columbro, 2020; Wellman et al., 1996), are less likely to be affected by the absence of F2F interactions during the COVID-19 pandemic. Our study puts this conjecture to the test by examining the scenario where F2F interactions are largely absent due to the lockdowns in China.\n\nMoving from the specific context of OSS to a more general comparison of F2F interactions and CMC in virtual teams, the findings remain inconclusive. Townsend et al. (1998) find that CMC can facilitate efficient connections between individuals regardless of their geographical locations, thereby significantly improving the performance of virtual teams. Moreover, team members distributed across different time zones can leverage CMC to coordinate more effectively and operate within a more flexible and efficient 24-hour cycle (Lipnack and Stamps, 1999). Therefore, Bergiel et al. (2006) suggest that virtual collaboration via CMC can overcome the constraints of time, distance, and organizational boundaries, leading to improvements in productivity and efficiency among team members.\n\nOn the other hand, another stream of the literature suggests that compared with F2F interactions, CMC carries fewer physical and emotional cues, thereby limiting the extent and synchronicity of information exchange (Cramton and Webber, 2005; Daft and Lengel, 1986; Dennis et al., 2008). This can negatively affect team members\u2019 capabilities to establish mutual understanding (Kraut et al., 1982; Sproull and Kiesler, 1986; Straus and McGrath, 1994), their sense of belonging, and awareness of group activities (Cramton, 2001). Moreover, in the absence of F2F interactions, individuals are more likely to experience heightened conflicts (Wakefield et al., 2008), leading to decreased team productivity and satisfaction (Hambrick et al., 1998; Lau and Murnighan, 1998). Furthermore, despite recent advances in communication technologies, such as videoconferencing, which allow users to convey more non-verbal information cues than before, the lack of F2F interactions can still negatively affect innovation that relies on collaborative idea generation. A recent study (Brucks and Levav, 2022) discovered that, despite technological advancement, the absence of F2F interactions during the COVID-19 pandemic still negatively affected innovation. The authors attribute this finding to the differences between the physical nature of videoconferencing and F2F interactions, as the former focuses individuals on a display with a narrower cognitive focus.\n\nTo summarize, the existing literature has yet to conclusively establish whether, despite technological advancement, CMC can effectively replace the role of F2F interactions without impacting the productivity of collaborative work. Some studies (Crowston et al., 2007; Ocker et al., 1998) suggest that a mix of both CMC and F2F interactions is most beneficial for teamwork. However, as the preference for remote work and reliance on CMC continue to rise at an unprecedented scale even in the post-pandemic era, our research aims to fill this gap by studying whether CMC can fully replace F2F interactions without negatively affecting teamwork productivity.\n\n2.3. Motivations for open source software contributions\n\nAnother stream of research that is very relevant to our study is the literature on motivations for contributing to OSS development. The prevailing framework in this field typically categorizes OSS developers\u2019 motivations into intrinsic and extrinsic factors. Intrinsic motivations often stem from developers\u2019 personal needs such as altruism and joy derived from contributing (Davidson et al., 2014; Hertel et al., 2003), whereas extrinsic motivations are usually related to utility-based external rewards, such as opportunities for career advancement (Fang and Neufeld, 2009; Yang et al., 2021). Studies by Hertel et al. (2003) and Shah (2006) have found that intrinsic motivations, such as enjoyment and fun, significantly influence OSS developers\u2019 contributions. However, during the COVID-19-induced lockdowns, developers may experience fear and stress related to the health of their family and friends, which could negatively affect these intrinsic motivations, especially in the early stages of the pandemic.\n\nHowever, there is a dearth of OSS motivation research that focuses on the social effects through which developers\u2019 contribution motivations are influenced by their interactions with their peers. For instance, individuals\u2019 OSS contributions are encouraged by the attention they received from their peers (Moqri et al., 2018) and collaboration with other team members (Crowston et al., 2007; Daniel and Stewart, 2016;\nXu et al., 2009). von Krogh et al. (2012) suggest that aspects of social practice like ethics and virtues are largely overlooked as a context for contribution motivations. These aspects are typically cultivated through social interactions among OSS community members, including both F2F interactions and CMC. Our study aims to enrich the understanding of the research community and policymakers on how major disruptions like lockdowns may limit such social effects, particularly through the reduced F2F interactions, and thereby influence OSS developers' contribution motivations.\n\n3. Methods\n\nIn this section, we first adopt a mixed-method approach to study the impacts of two lockdowns in China on OSS developers' contributions. We treat the lockdowns in Wuhan and Xi'an as natural experiments, and for each GitHub developer in Wuhan or Xi'an, we match her with a developer in comparable regions that did not experience lockdown measures. We then utilize DID and difference-in-difference-in-differences (DDD) analyses, combined with propensity score matching (PSM), to discern the impacts. To delve deeper into the mechanisms that underpin the changes in developers' OSS contributions during the lockdowns, we also administer a survey to GitHub developers in both lockdowns. In Section 4, we report the main results of this analysis and perform a series of robustness tests to validate our findings.\n\nMoreover, in Section 5, we extend our empirical approaches, such as the DID analysis, to data collected from a distinct context \u2013 the US. This supplementary analysis is designed to investigate whether the patterns observed in our findings on Chinese lockdowns are also present in other regions. By comparing the effects in China and the US, we aim to provide valuable insights into the wider implications of lockdown measures on OSS contributions on a global scale.\n\n3.1. Experimental settings\n\nCOVID-19 has become one of the most severe global pandemics in recent decades (Fang et al., 2020). Our first natural experiment leverages the lockdown imposed in Wuhan, China from January 23 to April 8, 2020, in response to the initial major outbreak of COVID-19. The authorities enforced a citywide lockdown in Wuhan, leading to the closure of all public transport and non-essential businesses. The residents of all the 7148 residential communities in Wuhan were mandated to stay at home, with leaving only permitted in emergencies. The abrupt imposition of the Wuhan lockdown, which was implemented without prior warning, serves as an exogenous shock. This natural experimental setting provides us with an opportunity to examine the impact of the Wuhan lockdown on OSS contributions.\n\nWe designate Wuhan developers as the treatment group and choose developers in Hong Kong, Macau, and Taiwan (HMT) regions as the control group for several reasons. Firstly, most major cities in mainland China swiftly followed Wuhan's lead in implementing strict lockdown or social distancing measures, while the HMT regions did not implement such measures until March 2020. Hong Kong authorities prohibited indoor and outdoor public gatherings of more than four people in March 2020. Meanwhile, although Macau authorities took some ad-hoc measures such as closing casinos and public parks, they did not implement any citywide lockdown measures. Therefore, while developers in Wuhan were strictly required to stay at home in the early stage of this COVID-19 outbreak, those in HMT regions could go out and engage in F2F interactions. Secondly, compared with developers in other parts of the world, HMT developers are much more similar to Wuhan developers as they belong to the same ethnic group \u2013 Han Chinese (Wikipedia, 2022) and share similar cultural backgrounds.\n\nWe have chosen a ten-week period surrounding the day of the Wuhan lockdown (i.e., between December 19, 2019, and February 27, 2020) as the time frame for the DID analysis mainly for two reasons. Firstly, this timeframe is too short, allowing us to observe potential changes in developers' contributions. Secondly, as COVID-19 began to spread to other parts of the world, including the HMT regions, their developers might have started to consciously avoid F2F meetings with others to prevent potential COVID-19 infections, even before any lockdown or social distancing measures were implemented. This would make HMT developers less ideal control subjects in the natural experiment. Therefore, we set the end of the time window as February 27, 2020, as COVID-19 cases in HMT regions only started to increase significantly in March.\n\nWe also leverage the lockdown of Xi'an in China as a second natural experiment. The strictness of a city's lockdown measures often corresponds to the severity of the local outbreak, leading to endogeneity when attempting to causally identify the impacts of the lockdown measures. During the pandemic, China's Zero-COVID policy provides an ideal opportunity to address this endogeneity issue. This policy, which is centered around lockdowns, aims to halt the transmission of COVID-19 as soon as they are detected through mass testing (Chen et al., 2022a). Even a few COVID-19 cases can trigger a full-scale citywide lockdown in a very short period (Chen et al., 2022a). Such swift lockdowns in response to extra small numbers of new COVID-19 cases minimize the endogeneity of policy responses.\n\nThe Xi'an lockdown, which lasted from December 23, 2021, to January 23, 2022, was as strict as the Wuhan lockdown, even with far fewer initial infection cases, thus minimizing the endogeneity of policy responses. During the Xi'an lockdown, all public transport and non-essential businesses were suspended, and all Xi'an residents were strictly required to stay at home except for emergencies. Thus, we use Xi'an developers as the treatment group. To construct the control group, we follow existing studies (Muralidharan and Prakash, 2017; Wang, 2022) by choosing developers in the seven capitals of provinces (or municipalities) neighboring Xi'an that did not implement any lockdown measures during the Xi'an lockdown. This is because developers in these neighboring capitals are more similar to Xi'an developers in many aspects. The timeframe of the DID analysis covers the eight weeks surrounding the day of Xi'an lockdown (i.e., between November 25, 2021, and January 20, 2022).\n\n3.2. Data collection for the Chinese lockdowns\n\nOur empirical study collects and uses two types of data: GitHub data and COVID-19 case data. We obtain historical GitHub data through its API and GH Archive database. The latter archives public OSS development activities on GitHub since February 2011 and has been widely used in recent OSS research (Moqri et al., 2018; Negoi\u021ba et al., 2019). We first use the \u201csearch-by-location\u201d function of the GitHub API to extract developers who had at least one public repository and were located in the regions chosen for the natural experiments. For each experiment, we further select developers who joined GitHub before the chosen time window and exclude developers who did not push any commit within that time window. This procedure yields 1695 Wuhan developers and 5282 HMT developers for the Wuhan case. The selected sample of the Xi'an case includes 919 Xi'an developers and 4274 developers in the seven neighboring provincial capitals (or municipalities). Moreover, we obtain data about COVID-19 cases from relevant health authorities such as the National Health Commission of China as well as mainstream media. This comprehensive data collection allows us to conduct a robust analysis of the impact of Chinese lockdowns on OSS contributions.\n\n3.3. Propensity score matching\n\nTo address potential endogeneity issues, we employ the DID technique in conjunction with PSM, following the methodology of previous studies (Chen et al., 2019; Foerderer, 2020). PSM selects control subjects by measuring their distance from the treated subjects based on pre-treatment covariates. This method is particularly effective in overcoming the curse of dimensionality (i.e., too many covariates) by transforming covariate vectors into a single propensity score and then\nselecting control subjects closest to the treated ones (Chen et al., 2022b). It allows us to create a more balanced and comparable control group, thereby enhancing the robustness of our findings.\n\nMore specifically, we apply a one-on-one nearest neighbor matching without replacement to select a control developer for each treated developer based on a set of observable characteristics before the lockdown (Fang and Neufeld, 2009; Foss et al., 2021; Moqri et al., 2018; Zhang and Zhu, 2011). These characteristics include the number of weeks since the developer joined GitHub, whether the developer is a student or an employee based on her profile, whether the developer reports her contact information in her profile, the number of OSS projects that the developer created, the number of commits that the developer contributed on GitHub, the number of stars/issues/comments that the developer received for her repositories, the number of stars/issues/comments that the developer sent out, whether the developer used the following core language on GitHub \u2013 C/C++/C#/Go/Java/JavaScript/PHP/Python/Ruby/Scala/TypeScript (GitHub, 2022a) as her primary programming language, the number of the developer\u2019s collaborators that contributed to the same projects with her, the number of the developer\u2019s local collaborators who contributed to the same projects and lived in the same region with her, the average age of the developer\u2019s OSS projects, and the number of projects with the General Public License (GPL) created by the developer. GPL, being the most restrictive license, could serve as a proxy for the developer\u2019s ideological level (Foss et al., 2021). This PSM procedure yields 1608 matched pairs of Wuhan (treatment) and HMT (control) developers for the Wuhan lockdown and 919 matched pairs of Xi\u2019an (treatment) and neighboring-city (control) developers for the Xi\u2019an lockdown case.\n\nTable 1 summarizes the mean values of the pre-treatment characteristics for all developers in the selected regions before matching. The results of the t-test indicate significant differences across many observable characteristics between the developers in lockdown areas and those in non-lockdown areas for both lockdowns. These differences suggest that a direct comparison between the treatment and control groups in the two natural experiments may not be appropriate. Therefore, we apply the aforementioned matching procedure. Table 2 reports the mean values of the same characteristics for the matched sample. The t-test results in Table 2 show that there are no significant differences across these observable characteristics between the treatment and matched control groups for both lockdowns. This suggests that the matching procedure has effectively balanced the observable characteristics between the treatment and matched control groups.\n\n### 3.4. Empirical models\n\n#### 3.4.1. Difference-in-differences model\n\nFor each natural experiment, we now examine the change in OSS contributions of every developer selected in the matched sample using the following DID regression framework:\n\n\\[\n\\text{CONTRIBUTION}_{it} = \\alpha + \\beta \\text{AFTER}_{it} \\times \\text{LOCKDOWN}_{it} + \\gamma \\text{CV}_{it} + \\mu_i + \\theta_t + \\epsilon_{it}\n\\]\n\nwhere \\(i\\) indexes the developer and \\(t\\) indexes the week. The dependent variable, \\(\\text{CONTRIBUTION}_{it}\\), is the weekly OSS contributions of each developer. We add one to the weekly number of commits a developer contributed to GitHub and then take a logarithm to measure her weekly OSS contributions following previous literature (Hu et al., 2023; Moqri et al., 2018; Zhang and Zhu, 2011). A commit is a change made to an OSS project, such as adding, modifying, and deleting codes. \\(\\text{AFTER}_{it}\\) is a dummy variable that equals one if the time period is after the day of lockdown and zero otherwise. \\(\\text{LOCKDOWN}_{it}\\) is a dummy variable that equals one if the developer is in the treatment group (i.e., in the city\n\n| Lockdown of Wuhan | Mean | T-test | Lockdown of Xi'an | Mean | T-test |\n|-------------------|------|--------|-------------------|------|--------|\n| | All the Wuhan developers | All the HMT developers | Difference | All the Xi'an developers | All the neighboring-city developers | Difference |\n| Weeks | 174.761 | 224.249 | -49.488 *** | 258.309 | 259.105 | -0.796 |\n| Student | 0.305 | 0.162 | 0.143 *** | 0.279 | 0.224 | 0.054 *** |\n| Employee | 0.232 | 0.290 | -0.058 *** | 0.288 | 0.277 | 0.011 |\n| Contact | 0.721 | 0.690 | 0.031 ** | 0.703 | 0.730 | -0.027 * |\n| Number of projects| 21.780 | 26.074 | -4.294 *** | 27.349 | 30.320 | -2.970 |\n| Commits | 709.337 | 1489.132 | -779.795 ** | 1869.550 | 1725.299 | 144.251 |\n| Stars received | 126.292 | 77.595 | 48.697 | 139.702 | 260.150 | -120.448 |\n| Issues received | 6.959 | 9.129 | -2.169 | 11.473 | 15.097 | -3.624 |\n| Comments received | 12.684 | 24.629 | -11.945 ** | 29.706 | 39.904 | -10.198 |\n| Stars sent out | 104.883 | 107.217 | -2.334 | 118.405 | 153.768 | -35.364 *** |\n| Issues sent out | 8.740 | 12.421 | -3.681 * | 13.799 | 14.680 | -0.881 |\n| Comments sent out | 22.530 | 47.056 | -23.526 ** | 61.226 | 49.162 | 12.064 |\n| C | 0.041 | 0.041 | 0.000 | 0.039 | 0.036 | 0.003 |\n| C++ | 0.086 | 0.061 | 0.025 *** | 0.073 | 0.064 | 0.009 |\n| C# | 0.019 | 0.041 | -0.022 *** | 0.027 | 0.031 | -0.003 |\n| Go | 0.026 | 0.024 | 0.002 | 0.065 | 0.070 | -0.005 |\n| Java | 0.198 | 0.075 | 0.123 *** | 0.177 | 0.180 | -0.003 |\n| JavaScript | 0.202 | 0.225 | -0.023 ** | 0.182 | 0.199 | -0.018 |\n| PHP | 0.021 | 0.032 | -0.012 ** | 0.021 | 0.026 | -0.005 |\n| Python | 0.170 | 0.196 | -0.026 ** | 0.186 | 0.148 | 0.038 *** |\n| Ruby | 0.004 | 0.021 | -0.017 *** | 0.007 | 0.004 | 0.002 |\n| Scala | 0.002 | 0.002 | -0.000 | 0.003 | 0.001 | 0.002 |\n| TypeScript | 0.007 | 0.010 | -0.003 | 0.016 | 0.025 | -0.008 |\n| Collaborators | 367.333 | 899.337 | -532.003 *** | 632.457 | 683.823 | -51.366 |\n| Local collaborators| 0.835 | 10.044 | -9.208 *** | 1.342 | 2.193 | -0.851 *** |\n| Average age of projects | 64.415 | 88.233 | -23.819 *** | 100.200 | 102.360 | -2.160 |\n| Number of projects with GPL | 1.045 | 1.187 | -0.142 | 1.256 | 1.681 | -0.425 ** |\n\n* p < 0.1.\n** p < 0.05.\n*** p < 0.01.\nwhere lockdown is implemented) and zero otherwise. CV$_{it}$ contains a set of control variables that might influence a developer\u2019s OSS contributions according to previous research (Fang and Neufeld, 2009; Moqri et al., 2018; Zhang and Zhu, 2011): the number of OSS projects created by the developer (REPO$_{it}$), the number of weeks since the developer joined GitHub (TENURE$_{it}$), the number of stars the developer received for her repositories (STARR$_{it}$), the number of stars the developer sent out (STARS$_{it}$), the number of issues the developer received for her repositories (ISSUER$_{it}$), the number of issues the developer sent out (ISSUES$_{it}$), the number of comments the developer received for her repositories (COMMENTR$_{it}$), the number of comments the developer sent out (COMMENTS$_{it}$), and the number of new COVID-19 cases in the developer\u2019s region (CASE$_{it}$).\n\nTo control for the effects of time-invariant individual characteristics of developer $i$, especially those that are unobservable, we incorporate the individual fixed effect $\\mu_i$ in our DID model. Moreover, as opposed to the standard two-period DID model, our DID model spans ten periods for the Wuhan lockdown case and eight periods for the Xi\u2019an lockdown case. Consequently, we need to control for variables that remain constant across subjects but vary over different periods. Therefore, we include the time fixed effect $\\theta_t$, which comprises weekly time dummies that control for time trends. The LOCKDOWN$_{it}$ and AFTER$_{it}$ in the standard two-period DID model are then absorbed by the individual and time fixed effects, respectively. $\\epsilon_{it}$ is the error term. The coefficient $\\beta$ indicates the impact of lockdown on developers\u2019 OSS contributions. A negative coefficient would suggest the lockdown reduces developers\u2019 OSS contributions, whereas a positive coefficient would indicate otherwise.\n\n### 3.4.2. Difference-in-difference-in-differences models\n\nWe now examine the impact of the absence of F2F interactions caused by the lockdowns. If F2F interactions serve as important motivations for OSS contributions, as previous research has suggested (Crowston et al., 2007; Stam, 2009), we expect that developers who regularly engaged in F2F meetings with their collaborators would be more profoundly affected by the lockdown. To this end, we use a GitHub developer\u2019s engagement with online comments (i.e., GitHub-supported CMC) as a proxy for her tendency to meet OSS collaborators F2F before the lockdown. This approach is grounded by previous studies that have observed that people who engage more in CMC are also more likely to meet F2F. Such a correlation is understood to reflect underlying social needs and preferences (Huang et al., 2022; Khalis and Mikami, 2018; Suphan and Mierzejewska, 2016). Furthermore, CMC has been found to cultivate social relationships and facilitate the coordination of F2F meetings (DiMaggio et al., 2001; Howard et al., 2001; Kraut et al., 2002; Suphan et al., 2012). This relationship between online and offline interaction is further supported by Brandtz\u00e6g and Nov (2011), who discovered that Facebook users who prioritize CMC with close friends also interact more frequently in F2F settings. In addition, our survey study in Section 4.3.4 finds that developers in lockdowns who made more online comments to their local GitHub collaborators before the lockdown were also more likely to meet with each other F2F, which is consistent with the findings of previous studies (Huang et al., 2022; Khalis and Mikami, 2018; Suphan and Mierzejewska, 2016).\n\nThis intricate relationship between CMC and F2F interactions lays the groundwork for our DDD analysis. To operationalize a GitHub developer\u2019s tendency to meet her local collaborators F2F, we compute the number of online comments she made to them on the GitHub platform before the lockdown. This metric serves as a proxy for her social engagement and preference for F2F interactions. Building on the baseline DID specification, we develop a more nuanced DDD specification:\n\n$$CONTRIBUTION_{it} = \\mu_i + \\beta_i \\times LOCKDOWN_{it} + \\beta_i \\times AFTER_{it} \\times LOCKDOWN_{it} + \\gamma CV_{it} + \\beta_i \\times AFTER_{it} \\times LOCKDOWN_{it} + \\beta_i \\times AFTER_{it} \\times LOCKDOWN_{it} + \\epsilon_{it}$$\n\nwhere $LOCCOMS_i$ is the number of online comments that developer $i$\nmade to her GitHub collaborators in the same region before the lockdown. It is important to note that the individual fixed effect $\\mu_i$ absorbs the $\\text{LOCKDOWN}_t \\times \\text{LOCCOMS}_i$ term (Foerderer, 2020). We anticipate the coefficient $\\beta_3$ to be significant and negative, indicating that developers who engaged more in online interactions with their local collaborators were adversely affected by the lockdown, leading to reduced contributions to OSS projects (Miller et al., 2019). To ensure that our results are robust to this alternative explanation, we consider the following DDD specification:\n\n$$\\text{CONTRIBUTION}_{it} = \\alpha + \\beta_1 \\text{AFTER}_t \\times \\text{LOCKDOWN}_i + \\beta_2 \\text{AFTER}_t \\times \\text{COMS}_i + \\beta_3 \\text{AFTER}_t \\times \\text{LOCKDOWN}_i \\times \\text{COMS}_i + \\gamma \\text{CV}_i + \\mu_i + \\theta_t + \\epsilon_{it}$$\n\n(3)\n\nwhere COMS$_i$ is the number of online comments that developer $i$ made to all her GitHub collaborators (including non-local ones) before the lockdown. If the alternative explanation is true, then the coefficient $\\beta_3$ should be significant like the one in Eq. (2), as the general social effects should apply to all the GitHub collaborators, regardless of their location.\n\nOn the other hand, if the coefficient $\\beta_3$ is insignificant in Eq. (3) but significant in Eq. (2), this alternative explanation can be dismissed.\n\n4. Results and robustness checks for Chinese lockdowns\n\n4.1. Results from the difference-in-differences model\n\nTable 3 reports the results of Eqs. (1)\u2013(3). Columns (1) and (4) show the results of Eq. (1) for the Wuhan and Xi'an lockdowns, respectively. The coefficient of $\\text{AFTER}_t \\times \\text{LOCKDOWN}_i$ in Column (1) is negative and statistically significant at the 1% significance level, suggesting that the Wuhan lockdown led to a reduction in developers\u2019 OSS contributions. Specifically, a coefficient of $-0.111$ suggests that Wuhan developers\u2019 contributions decreased by 10.5% ($= e^{-0.111} - 1$) over the five weeks following the lockdown. In contrast, the coefficient of $\\text{AFTER}_t \\times \\text{LOCKDOWN}_i$ in Column (4) is positive and significant at the 5% level, suggesting the Xi'an lockdown resulted in an increase in developers\u2019 OSS contributions. A coefficient of 0.086 suggests that the Xi'an developers\u2019 contributions increased by roughly 9.0% ($= e^{0.086} - 1$) over the four weeks after the lockdown.\n\nAccording to the findings of our survey study presented in Section 4.3.4, these contrasting results between the Wuhan and Xi'an lockdowns can be mainly attributed to an adaption effect. When COVID-19 initially emerged in Wuhan, the unprecedented nature of the virus, coupled with its rapid spread, and severity likely instilled a high level of fear and uncertainty among the population. Therefore, Wuhan developers may\n\n| Dependent variable: CONTRIBUTION$_{it}$ | Wuhan Lockdown (2020) | Xi'an Lockdown (2021) |\n|----------------------------------------|------------------------|------------------------|\n| | (1) | (2) | (3) | (4) | (5) | (6) |\n| $\\text{AFTER}_t \\times \\text{LOCKDOWN}_i$ | $-0.111^{***}$ | $-0.108^{***}$ | $-0.110^{***}$ | $0.086^{**}$ | $0.089^{**}$ | $0.085^{**}$ |\n| | (0.029) | (0.030) | (0.030) | (0.043) | (0.043) | (0.043) |\n| $\\text{AFTER}_t \\times \\text{LOCCOMS}_i$ | $0.003^{***}$ | $0.001^{**}$ | $0.001^{**}$ | $0.000^{***}$ | $0.000^{***}$ | $0.000^{***}$ |\n| $\\text{AFTER}_t \\times \\text{LOCKDOWN}_i \\times \\text{COMS}_i$ | $-0.007^{***}$ | $0.003$ | $0.003$ | $0.000$ | $0.000$ | $0.000$ |\n| | (0.003) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) |\n| $\\text{AFTER}_t \\times \\text{COMS}_i$ | $0.000$ | $0.000$ | $0.000$ | $0.000$ | $0.000$ | $0.000$ |\n| | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) |\n| $\\text{REPO}_i$ | $0.024$ | $0.024$ | $0.024$ | $0.372^{***}$ | $0.372^{***}$ | $0.372^{***}$ |\n| | (0.022) | (0.022) | (0.022) | (0.028) | (0.028) | (0.028) |\n| $\\text{TENURE}_i$ | $-0.030$ | $-0.030$ | $-0.029$ | $0.044$ | $0.047$ | $0.044$ |\n| | (0.039) | (0.039) | (0.039) | (0.051) | (0.051) | (0.051) |\n| $\\text{STARR}_i$ | $0.016^{**}$ | $0.016^{**}$ | $0.016^{**}$ | $0.003$ | $0.003$ | $0.003$ |\n| | (0.008) | (0.008) | (0.008) | (0.002) | (0.002) | (0.002) |\n| $\\text{STARS}_i$ | $0.015^{**}$ | $0.015^{**}$ | $0.015^{**}$ | $0.027^{***}$ | $0.027^{***}$ | $0.027^{***}$ |\n| | (0.006) | (0.006) | (0.006) | (0.008) | (0.008) | (0.008) |\n| $\\text{ISSUE}_i$ | $-0.021$ | $-0.020$ | $-0.022$ | $0.014$ | $0.014$ | $0.014$ |\n| | (0.025) | (0.025) | (0.025) | (0.038) | (0.038) | (0.038) |\n| $\\text{ISSUES}_i$ | $0.076^{***}$ | $0.076^{***}$ | $0.077^{***}$ | $0.058^{*}$ | $0.058^{*}$ | $0.057^{*}$ |\n| | (0.028) | (0.028) | (0.028) | (0.033) | (0.033) | (0.033) |\n| $\\text{COMMENT}_i$ | $0.022$ | $0.022$ | $0.022$ | $0.011$ | $0.011$ | $0.011$ |\n| | (0.014) | (0.014) | (0.014) | (0.011) | (0.011) | (0.011) |\n| $\\text{COMMENTS}_i$ | $0.047^{***}$ | $0.047^{***}$ | $0.047^{***}$ | $0.053^{***}$ | $0.053^{***}$ | $0.053^{***}$ |\n| | (0.014) | (0.014) | (0.014) | (0.007) | (0.007) | (0.008) |\n| $\\text{CASE}_i$ | $0.000$ | $0.000$ | $0.000$ | $-0.000$ | $-0.000$ | $-0.000$ |\n| | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) |\n| Constant | $5.769$ | $5.727$ | $5.585$ | $-10.529$ | $-11.429$ | $-10.494$ |\n| | (6.688) | (6.690) | (6.685) | (13.023) | (13.052) | (13.041) |\n| Individual FE | Yes | Yes | Yes | Yes | Yes | Yes |\n| Time FE | Yes | Yes | Yes | Yes | Yes | Yes |\n| Observations | 32,160 | 32,160 | 32,160 | 14,704 | 14,704 | 14,704 |\n| R-squared | 0.048 | 0.048 | 0.048 | 0.083 | 0.083 | 0.083 |\n\nRobust standard errors in brackets.\n\n* $p < 0.1$.\n** $p < 0.05$.\n*** $p < 0.01$. \n\n\nhave found it challenging to focus on and contribute to OSS projects during this turbulent period (Neto et al., 2021; Ralph et al., 2020). On the other hand, the Xi'an lockdown occurred nearly two years after Wuhan's, following more than a dozen city-level lockdowns. By that time, the residents in Xi'an were much more familiar with the virus and the associated lockdown measures, and they did not experience the same level of fear as those in Wuhan. They have adapted more readily to the new lifestyles induced by lockdown measures, including the new norm of WFH. This adaptation, coupled with the opportunities offered by WFH, such as increased available time and flexibility, may have enabled Xi'an developers to increase their OSS contributions (Ford et al., 2021; Neto et al., 2021).\n\n4.2. Results from the difference-in-difference-in-differences models\n\nColumns (2) and (5) of Table 3 present the results of Eq. (2) for the Wuhan and Xi'an lockdowns, respectively. The significant and negative coefficient of \\( \\text{AFTER}_t \\times \\text{LOCKDOWN}_i \\times \\text{LOCCOMS} \\) in Column (2) suggests that Wuhan developers who engaged more in online comments with their local GitHub collaborators were more negatively affected by the Wuhan lockdown. As indicated by our survey results in Section 3.4.2, developers who made more online comments to their local collaborators were more likely to meet with each other F2F before the lockdown. Therefore, the above result indicates that Wuhan developers who were more likely to have F2F interactions before the lockdown experienced a more pronounced reduction in their OSS contributions. On the other hand, the significant and negative coefficient of \\( \\text{AFTER}_t \\times \\text{LOCKDOWN}_i \\times \\text{LOCCOMS} \\) in Column (5) suggests that the positive effect on OSS contributions was weaker for the Xi'an developers who engaged more in online comments with local collaborators before. These Xi'an developers were also more likely to have F2F interactions, reflecting a similar pattern to that observed in the Wuhan lockdown. These findings highlight the importance of F2F interactions in affecting OSS contributions, and the loss of these interactions during lockdowns has a significant impact on such contributions.\n\nColumns (3) and (6) of Table 3 report the results of Eq. (3) for the Wuhan and Xi'an lockdowns, respectively. The coefficients of \\( \\text{AFTER}_t \\times \\text{LOCKDOWN}_i \\times \\text{COMS} \\) are insignificant at the 10% level in both columns. As elaborated in Section 3.4.2, these results demonstrate that developers who were more socially engaged (i.e., those who made more online comments to their local collaborators) might become more concerned about the pandemic's negative impacts on others, leading to a decrease in their OSS contributions. Instead, these results of Eqs. (2) and (3) highlight that it is not the social nature of the developers but specifically the loss of F2F interactions during lockdowns that influences developers' OSS contributions.\n\nIn summary, the DID regression results suggest that the Wuhan lockdown led to a significant reduction in developers' OSS contributions, while the Xi'an lockdown resulted in an increase. Further analysis through the DDD regressions highlights the importance of F2F interactions in driving developers' OSS contributions on GitHub. The absence of these F2F interactions, brought about by lockdown measures, appears to negatively influence such contributions.\n\n4.3. Robustness checks\n\n4.3.1. Parallel trends\n\nThe key identification assumption for the DID estimation is the parallel trends assumption. This assumption posits that, before the lockdown, the OSS contributions of both the treatment group and the control group would follow the same temporal trend. If this assumption was not satisfied, the estimated effects could be biased, as the results could be driven by systematic differences between the treatment and control groups rather than the lockdown itself.\n\nTo ascertain the validity of our analysis, we conduct two sets of tests to examine whether our analysis satisfies this assumption. First, we plot the weekly average contributions (per developer) made by the treatment group (blue) and the control group (red) during the time window surrounding the Wuhan lockdown in Fig. 1(a) and the Xi'an lockdown in Fig. 1(b). To measure a developer's weekly contributions, we add one to the weekly number of commits she contributed to GitHub and then take the logarithm, consistent with the measure in our DID and DDD models. The green vertical line in each figure demarcates the day of lockdown. As Fig. 1(a) shows, the treatment and control groups exhibit almost identical contribution trends before the Wuhan lockdown, thereby fulfilling the parallel trends assumption. On the other hand, in the five weeks following the day of Wuhan lockdown, the contributions of the treatment group consistently fall below those of the control group, as evidenced by the substantial and persistent gap between the red and blue lines. Fig. 1(b) shows a similar pattern for the Xi'an lockdown, where both the treatment and control groups tend to contribute less over time before the lockdown, thus satisfying the parallel trends assumption. However, after the day of Xi'an lockdown, the control group continues the decreasing trend, while the treatment group exhibits a tendency to increase contributions.\n\nSecond, to further validate our findings, we adopt an event-study approach, a method commonly used in previous literature (Leslie and Wilson, 2020; Tanaka and Okamoto, 2021). This approach involves fitting the following equation:\n\n\\[\n\\text{CONTRIBUTION}_{it} = \\alpha + \\sum_{k=-n}^{0} \\beta_k \\text{WEEK}_{it-k} \\times \\text{LOCKDOWN}_i + \\gamma CV_{it} + \\mu_i + \\theta_t + \\epsilon_{it}\n\\]\n\nwhere \\( n \\) equals 5 for the Wuhan lockdown and equals 4 for the Xi'an lockdown. \\( \\text{WEEK}_{it} \\) is a dummy variable that equals one if week \\( t \\) corresponds to \\( k \\), and zero otherwise. We do not construct the week \\( k = 0 \\) in our sample but use the day of lockdown to separate the pre-treatment and post-treatment periods. \\( k = -1 \\) indicates the week just before the day of lockdown, so it is dropped from the equation as the reference week. Intuitively, \\( \\beta_k \\) captures the difference in contributions between the treatment and control groups in each week relative to \\( k = -1 \\). We expect the two groups to make similar contributions before the day of lockdown (\\( k < 0 \\)) and to diverge after the day of lockdown (\\( k > 0 \\)).\n\nFig. 2(a) and Fig. 2(b) show the estimated \\( \\beta_k \\) in Eq. (4) for the Wuhan and Xi'an lockdowns, separately. The green vertical line in each figure represents the day of lockdown, while the gray dotted lines surrounding each coefficient depict 95% confidence intervals. In both figures, the estimated \\( \\beta_k \\) (\\( k < 0 \\)) are all nearly zero, indicating no pre-treatment difference in the contribution trends between the treatment and control groups. Such a pattern confirms that the parallel trends assumption is satisfied in our analysis.\n\n4.3.2. A falsification test\n\nTo ensure that our estimated effects are not artifacts of seasonality, we conduct a falsification test to demonstrate that the effects are not replicated in a period without the lockdowns. This involves repeating the DID analysis for the same time window in previous years when COVID-19 had not yet emerged (Cui et al., 2022; Zhang and Zhu, 2011). For the Wuhan lockdown, we repeat the DID analysis using data a lunar year ago, as the time window encompasses the Chinese New Year holiday. For the Xi'an lockdown, we use data from two years earlier, considering that some developers might have experienced lockdowns a year ago, during a period when lockdowns had become more common in China. The control variable \\( \\text{CASE}_{it} \\) is excluded from this analysis since COVID-19 had not yet broken out during these earlier periods. This falsification test serves as a robustness check. If our original DID analysis was merely capturing seasonal effects, we would expect to find significant effects in these previous years as well. However, the absence of such effects would strengthen the validity of our main findings, confirming\nthat the observed changes in OSS contributions are indeed attributable to the lockdowns and not to underlying seasonal patterns.\n\nTable 4 reports the results of the falsification test. The placebo-treated treatment effects are found to be insignificant for both Wuhan and Xi\u2019an lockdowns. This implies that the developers in the treated groups did not significantly change their contributions during the same time window in previous years, thus ruling out the seasonal effects as a driving factor behind the observed changes in OSS contributions.\n\n4.3.3. Alternative samples\n\nWe also replicate the DID and DDD analyses using two alternative matched samples to ensure that our results are not driven by the specific choice of the caliper in propensity score matching. In the main analysis, we used a caliper of 0.3 for the Wuhan lockdown to ensure no statistical difference in developer characteristics between the treatment and control groups (Chen et al., 2019; Wang, 2022). The caliper defines the range within which the (logit of) propensity scores must fall to be considered a valid match (Cochran and Rubin, 1973). While a narrower caliper can result in the inclusion of fewer subjects, it can also enhance the balance between the treatment and control groups, thereby reducing bias in estimating treatment effects (Wang, 2022; Wang et al., 2013). To further validate our findings, following Wang (2022), we employed alternative calipers of 0.1 for the Wuhan lockdown and 0.001 for the Xi\u2019an lockdown. The new PSM with these calipers yielded a matched sample of 1557 developers for the Wuhan lockdown and a matched sample of 906 developers for the Xi\u2019an lockdown, in both the treatment and control groups. Table 5 presents the t-test results after matching with the new calipers. Importantly, for both lockdowns, none of the differences between the treatment and control groups were found to be significant at the 10% level, indicating the two groups remained comparable for the DID analysis after matching, even with the alternative calipers.\n\nTable 6 shows the regression results of Eqs. (1)\u2013(3) based on the alternative matched samples. The coefficients of \\( \\text{AFTER}_t \\times \\text{LOCKDOWN}_i \\), \\( \\text{AFTER}_t \\times \\text{LOCKDOWN}_i \\times \\text{LOCCOMS}_i \\), and \\( \\text{AFTER}_t \\times \\text{LOCKDOWN}_i \\times \\text{COMS}_i \\) are found to be consistent with those in the main analyses for both Wuhan and Xi\u2019an lockdowns, suggesting that our results are not driven by the choice of the caliper in the PSM process.\n\n4.3.4. A survey study\n\nWe further complement our empirical analyses with a survey study, conducted to delve into the underlying mechanisms and influencing factors behind the changes in developers\u2019 OSS contributions before and after the lockdowns. This survey targeted the treated developers in our\nmatched sample who had provided their email addresses on GitHub, encompassing 879 Wuhan developers and 463 Xi'an developers. To encourage participation, we offered an incentive of 20 Chinese Yuan to each respondent who successfully completed the questionnaire. The questionnaire, detailed in Appendix A, was designed with questions answered on a five-point Likert scale. Eventually, we received 109 responses from the Wuhan developers and 71 responses from the Xi'an developers.\n\nAnother objective of our survey study was to justify an important assumption underlying our DDD analysis: developers who engaged in more online comments with their local collaborators on GitHub were also more likely to meet F2F. To examine this relationship, we surveyed the treated developers about their tendencies in both online commenting and F2F interactions with their local collaborators. We then conduct a correlation test on these tendencies for both the Wuhan and Xi'an developers, the results of which are detailed in Table A1 in Appendix A. The findings reveal significant and positive correlation coefficients between the tendencies for online commenting and F2F interactions. This supports the assumption of our DDD analysis, reinforcing the validity of our empirical approach and the conclusions drawn from it.\n\nTable 7 shows the results of a linear regression analysis that explores various surveyed factors to explain the changes in OSS contributions during the two Chinese lockdowns. The dependent variable represents the change in contributions, calculated as the difference between a respondent's total contributions on GitHub during the post-treatment period and her total contributions during the pre-treatment period. The independent variables consist of the respondents' ratings for each of the surveyed factors, as detailed in Questions 2\u20136 in the questionnaire provided in Appendix A. These factors were carefully selected for inclusion in the questionnaire based on previous research findings related to work productivity during COVID-19-induced WFH scenarios (Bao et al., 2022; Ford et al., 2021; Miller et al., 2021; Neto et al., 2021; Walters et al., 2022).\n\n### Table 4\nFalsification test results for Chinese lockdowns.\n\n| Dependent variable: CONTRIBUTION<sub>i</sub> | Wuhan lockdown | Xi'an lockdown |\n|-------------------------------------------|----------------|---------------|\n| (1) | (2) |\n| AFTER<sub>i</sub> \u00d7 LOCKDOWN<sub>i</sub> | 0.018 | \u22120.024 |\n| (0.019) | (0.024) |\n| REPO<sub>i</sub> | 0.128 *** | 0.296 *** |\n| (0.028) | (0.031) |\n| TENURE<sub>i</sub> | 0.010 | 0.030 |\n| (0.026) | (0.036) |\n| STARR<sub>i</sub> | 0.001 | \u22120.000 |\n| (0.001) | (0.000) |\n| STARR<sub>i</sub> | 0.045 *** | 0.030 *** |\n| (0.007) | (0.005) |\n| ISSUER<sub>i</sub> | \u22120.002 | 0.045 |\n| (0.038) | (0.050) |\n| ISSUES<sub>i</sub> | 0.078 ** | 0.095 ** |\n| (0.040) | (0.045) |\n| COMMENTR<sub>i</sub> | 0.006 | \u22120.010 |\n| (0.013) | (0.014) |\n| COMMENTS<sub>i</sub> | 0.079 *** | 0.078 *** |\n| (0.016) | (0.015) |\n| Constant | \u22120.928 | \u22124.253 |\n| (3.143) | (5.374) |\n| Individual FE | Yes | Yes |\n| Time FE | Yes | Yes |\n| Observations | 32,160 | 14,704 |\n| R-squared | 0.151 | 0.083 |\n\nRobust standard errors in brackets.\n\n* p < 0.1.\n** p < 0.05.\n*** p < 0.01.\n\n### Table 5\nT-tests in the alternative matched sample for Chinese lockdowns.\n\n| | Wuhan lockdown | Xi'an lockdown |\n|--------------------------|----------------|---------------|\n| | Treatment group | Control group | Difference | Treatment group | Control group | Difference |\n| Weeks | 177.281 | 177.547 | \u22120.266 | 257.616 | 258.481 | \u22120.865 |\n| Student | 0.277 | 0.290 | \u22120.013 | 0.277 | 0.281 | \u22120.004 |\n| Employee | 0.244 | 0.247 | \u22120.003 | 0.286 | 0.278 | 0.008 |\n| Contact | 0.705 | 0.713 | \u22120.008 | 0.702 | 0.721 | \u22120.019 |\n| Number of projects | 21.423 | 21.780 | \u22120.357 | 27.156 | 25.883 | 1.273 |\n| Commits | 663.480 | 499.008 | 164.473 | 1710.940 | 1299.883 | 411.057 |\n| Stars received | 64.570 | 90.427 | \u221225.857 | 140.910 | 126.865 | 14.044 |\n| Issues received | 5.550 | 5.620 | \u22120.071 | 11.545 | 9.189 | 2.357 |\n| Comments received | 10.196 | 11.230 | \u22121.034 | 29.975 | 27.722 | 2.253 |\n| Stars sent out | 94.332 | 94.636 | \u22120.304 | 119.413 | 110.372 | 9.041 |\n| Issues sent out | 6.821 | 7.455 | \u22120.634 | 12.710 | 10.969 | 1.741 |\n| Comments sent out | 16.331 | 19.586 | \u22123.255 | 46.818 | 40.392 | 6.426 |\n| C | 0.043 | 0.041 | 0.002 | 0.040 | 0.044 | \u22120.004 |\n| C++ | 0.086 | 0.083 | 0.003 | 0.073 | 0.086 | \u22120.013 |\n| C# | 0.021 | 0.022 | \u22120.001 | 0.028 | 0.032 | \u22120.004 |\n| Go | 0.027 | 0.028 | \u22120.001 | 0.064 | 0.070 | \u22120.006 |\n| Java | 0.153 | 0.170 | \u22120.017 | 0.180 | 0.174 | 0.006 |\n| JavaScript | 0.209 | 0.215 | \u22120.006 | 0.184 | 0.180 | 0.004 |\n| PHP | 0.021 | 0.021 | 0.001 | 0.021 | 0.020 | 0.001 |\n| Python | 0.182 | 0.168 | 0.014 | 0.183 | 0.179 | 0.004 |\n| Ruby | 0.004 | 0.004 | 0.000 | 0.004 | 0.003 | 0.001 |\n| Scala | 0.002 | 0.001 | 0.001 | 0.001 | 0.000 | 0.001 |\n| TypeScript | 0.008 | 0.006 | 0.002 | 0.017 | 0.020 | \u22120.003 |\n| Collaborators | 184.192 | 167.135 | 17.057 | 561.135 | 527.577 | 33.557 |\n| Local collaborators | 0.789 | 0.830 | \u22120.041 | 1.254 | 1.092 | 0.162 |\n| Average age of projects | 65.386 | 65.249 | 0.137 | 100.184 | 98.570 | 1.615 |\n| Number of projects with GPL | 1.002 | 1.140 | \u22120.138 | 1.267 | 1.329 | \u22120.062 |\n\n* p < 0.1.\n** p < 0.05.\n*** p < 0.01.\n### Table 6\nRegression results for the alternative sample for Chinese lockdowns.\n\n| | Wuhan lockdown | | Xi'an lockdown | |\n|----------------------|----------------|----------------------|----------------|----------------------|\n| | (1) | (2) | (3) | (4) | (5) | (6) |\n| **AFTER** \u00d7 **LOCKDOWN** | | | | | |\n| | \u22120.106 *** | \u22120.103 *** | \u22120.101 *** | 0.089 *** | 0.092 ** | 0.090 ** |\n| | (0.030) | (0.030) | (0.030) | (0.043) | (0.044) | (0.043) |\n| **AFTER** \u00d7 **LOCCOMS** | | | | | |\n| | 0.003 *** | (0.001) | | 0.001 ** | (0.000) | |\n| **AFTER** \u00d7 **LOCKDOWN** | | | | | |\n| \u00d7 **LOCCOMS** | \u22120.007 *** | (0.003) | | \u22120.003 *** | (0.001) | |\n| **AFTER** \u00d7 **COMS** | | | | | |\n| \u00d7 **COMS** | 0.000 | (0.000) | | 0.000 | (0.000) | |\n| **REPO** | 0.024 | 0.024 | 0.024 | 0.371 *** | 0.371 *** | 0.371 *** |\n| | (0.021) | (0.021) | (0.021) | (0.028) | (0.028) | (0.028) |\n| **TENURE** | \u22120.038 | \u22120.038 | \u22120.038 | 0.046 | 0.050 | 0.047 |\n| | (0.040) | (0.040) | (0.040) | (0.051) | (0.051) | (0.051) |\n| **STARR** | 0.014 ** | 0.014 ** | 0.014 ** | 0.003 | 0.003 | 0.003 |\n| | (0.007) | (0.007) | (0.007) | (0.002) | (0.002) | (0.002) |\n| **STARS** | 0.015 ** | 0.015 ** | 0.015 ** | 0.027 *** | 0.027 *** | 0.027 *** |\n| | (0.006) | (0.006) | (0.007) | (0.008) | (0.008) | (0.008) |\n| **ISSUER** | \u22120.036 | \u22120.036 | \u22120.037 | 0.010 | 0.010 | 0.012 |\n| | (0.027) | (0.027) | (0.027) | (0.039) | (0.039) | (0.039) |\n| **ISSUES** | 0.088 *** | 0.087 *** | 0.089 *** | 0.065 * | 0.064 * | 0.063 * |\n| | (0.031) | (0.031) | (0.031) | (0.036) | (0.035) | (0.036) |\n| **COMMENTR** | 0.023 | 0.023 | 0.023 | 0.012 | 0.012 | 0.012 |\n| | (0.015) | (0.015) | (0.015) | (0.011) | (0.011) | (0.011) |\n| **COMMENTS** | 0.047 *** | 0.047 *** | 0.047 *** | 0.052 *** | 0.052 *** | 0.050 *** |\n| | (0.016) | (0.016) | (0.016) | (0.008) | (0.008) | (0.009) |\n| **CASE** | \u22120.000 | \u22120.000 | \u22120.000 | \u22120.000 | \u22120.000 | \u22120.000 |\n| | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) |\n| **Constant** | 7.166 | 7.125 | 7.101 | \u221211.153 | \u221212.068 | \u221211.456 |\n| | (6.877) | (6.879) | (6.874) | (13.010) | (13.038) | (13.058) |\n| **Observations** | 31,140 | 31,140 | 31,140 | 14,496 | 14,496 | 14,496 |\n| **R-squared** | 0.048 | 0.048 | 0.048 | 0.082 | 0.082 | 0.084 |\n\nRobust standard errors in brackets.\n\n* p < 0.1.\n** p < 0.05.\n*** p < 0.01.\n\nColumns (1) and (2) of Table 7 present the regression results based on the responses from the Wuhan and Xi'an developers, respectively. These results reveal that fear related to the COVID-19 pandemic and housework burden, which significantly curtailed OSS contributions during Wuhan\u2019s initial lockdown, no longer impacted Xi'an developers in 2021. On the other hand, the availability of uninterrupted time and increased flexibility positively influenced Xi'an developers' OSS contributions, a pattern not observed among their Wuhan counterparts in 2020. These findings, taken together with our DID and DDD regression results, highlight an adaptation effect of Xi'an developers.\n\nMore specifically, we posit that the Xi'an lockdown, occurring nearly two years after the Wuhan's and following numerous city-level lockdowns, allowed developers to adapt to the new norm of remote work. This adaptation allowed Xi'an developers to leverage the flexibility and opportunities provided by WFH, resulting in increased OSS contributions. In contrast, Wuhan developers, facing the novel threat of COVID-19, were impeded by fear and possibly lacked the capacity to engage in voluntary activities like OSS contributions. Moreover, the results show consistent patterns for both Wuhan and Xi'an developers, where the lack of F2F interactions significantly reduced their OSS contributions, while increased available time at home positively influenced them. These findings offer valuable insights into our understanding of how individuals adapt to unprecedented disruptions, providing valuable guidance for stakeholders in preparing for future challenges and fostering resilience.\n\n### 5. Results for the US lockdowns\n\nIn the preceding sections, we have conducted a comprehensive examination of the impacts of lockdowns on OSS contributions within the context of China. To broaden our understanding and assess the applicability of our findings beyond China, this section introduces the results of an additional empirical analysis, focusing on the lockdowns in the US. As explained in Section 1, the rationale for focusing on the US stems from its prominent role in the global OSS development community, as well as its unique circumstances surrounding the implementation of lockdown measures (i.e., stay-at-home orders) during the COVID-19 pandemic. By comparing the observed effects in China with those in the US, we seek to determine whether similar patterns emerge across different regions. This comparative analysis not only enhances the robustness of our findings but also contributes valuable insights into the broader implications of lockdown measures on the OSS development community worldwide.\n\nDuring the early stages of the virus' spread, between March and April of 2020, a total of 45 states and the District of Columbia in the US implemented either statewide or partial-state stay-at-home orders. These orders restricted residents from leaving their homes except for essential activities, such as obtaining food and performing essential work functions. In contrast, the remaining 5 states in the US questioned the necessity of such strict lockdown measures and refrained from issuing stay-at-home orders (Wu et al., 2020). One primary rationale behind this resistance was the belief that the residents would continue to...\nleave their homes for shopping or work, rendering the stay-at-home orders ineffective (Wang, 2022).\n\nIn alignment with the methodology outlined in Wang (2022), our study design constructs a control group consisting of OSS developers in all the states that refrained from implementing any stay-at-home orders. To form a treatment group, we follow the approach employed in earlier studies (Muralidharan and Prakash, 2017; Wang, 2022), selecting developers in states that both implemented statewide stay-at-home orders and are geographically adjacent to the control states. This selection criterion is based on the assumption that neighboring states are more likely to share similarities with the control group in both observable and unobservable characteristics. To refine our selection, we first extract developers who had at least one public repository and were exclusively located in one state within the US. We then further narrow down the treatment group by including only developers in states with fewer than ten thousand GitHub developers, ensuring consistency with the control group, where all states meet this criterion. The resulting control group consists of developers in five states \u2013 Arkansas, Iowa, Nebraska, North Dakota, and South Dakota. The treatment group includes developers in six neighboring states \u2013 Louisiana, Mississippi, Missouri, Montana, Tennessee, and Wisconsin. Table 8 provides a detailed summary of the start and end dates of the stay-at-home orders in these states, as obtained from the official announcements of each respective state. This process enhances the comparability between the treatment and control groups, thereby strengthening the validity of our analysis.\n\nFollowing the approach delineated by Wang (2022), we focus on the time window spanning from March 9, 2020, to April 20, 2020. This timeframe ensures that all developers in the treatment group have at least two weeks of data before and after the implementation of the stay-at-home orders. Consistent with Section 3.2, we include only those developers who joined GitHub before the chosen time window and pushed at least one commit during that period. Through this selection process, we arrive at a final data sample comprising 2583 treated developers and 4487 control developers.\n\nLike our analysis of Chinese lockdowns, we employ DID combined with PSM on the final data sample of US lockdowns. First, we apply a one-on-one nearest neighbor matching without replacement, selecting a control developer for each treated developer. This matching is based on the same set of covariates used in the analysis of Chinese lockdowns, ensuring methodological consistency. Through this procedure, we obtain 2583 matched pairs of both the treatment and control groups. Table 9 summarizes the mean values of the pre-treatment characteristics for the treatment and control groups before and after matching. The t-test results confirm that there are no significant differences across these characteristics between the treatment and control groups after matching. This successful matching enhances the validity of our subsequent analysis by ensuring that the treatment and control groups are comparable in terms of observable characteristics, thereby minimizing potential biases.\n\nWe then estimate the impact of stay-at-home orders on OSS contributions using the matched sample, employing a time-varying DID model:\n\n$$CONTRIBUTION_{it} = \\alpha + \\beta ORDER_{it} + \\gamma CV_{it} + \\mu_i + \\theta_t + \\epsilon_{it}$$ (5)\n\nWe also estimate the moderating effects of comment interactions with local collaborators and all collaborators using two separate models:\n\n$$CONTRIBUTION_{it} = \\alpha + \\beta_1 ORDER_{it} + \\beta_2 ORDER_{it} \\times LOCCOMS_{it} + \\gamma CV_{it} + \\mu_i + \\theta_t + \\epsilon_{it}$$ (6)\n\n$$CONTRIBUTION_{it} = \\alpha + \\beta_1 ORDER_{it} + \\beta_2 ORDER_{it} \\times COMS_{it} + \\gamma CV_{it} + \\mu_i + \\theta_t + \\epsilon_{it}$$ (7)\n\nwhere $i$ indexes the developer and $t$ indexes the date. $ORDER_{it}$ is a binary variable that equals one if the state where developer $i$ is located implemented a stay-at-home order on date $t$ or earlier, and zero otherwise. The definitions of the remaining variables are consistent with those in Eqs. (1)\u2013(3).\n\nTable 10 reports the results from the estimation of Eqs. (5)\u2013(7). These results adhere to the parallel trends assumption and remain robust when considering an alternative matched sample (please see the detailed robustness tests described in Appendix B). The coefficient of $ORDER_{it}$ is insignificant across all specifications, indicating that stay-at-home orders in the US did not have a significant impact on developers\u2019 OSS contributions. The insignificance of both moderating effects further corroborates this finding. These findings contrast with the impacts observed during the Wuhan and Xi\u2019an lockdowns, suggesting that the effects identified in the Chinese context may not be generalized to the less strict lockdowns implemented in the US.\n\nThe contract between the findings in China and the US may be attributed to the underlying differences in the stringency and enforcement of lockdown measures between these two significant nations. In China, the lockdowns were characterized by strict restrictions that\n\n### Table 7\nWhat explains the changes in OSS contributions during Chinese lockdowns?\n\n| Dependent variable: change in contributions | Wuhan developers | Xi\u2019an developers |\n|--------------------------------------------|------------------|-----------------|\n| (1) | (2) |\n| Available time | 0.202*** | 0.725*** |\n| (0.092) | (0.216) |\n| Interruptions | \u22120.123 | \u22120.389*** |\n| (0.083) | (0.159) |\n| Flexibility | 0.029 | 0.409** |\n| (0.066) | (0.189) |\n| Work environment | 0.158 | 0.129 |\n| (0.094) | (0.186) |\n| Fear | \u22120.987*** | \u22120.304 |\n| (0.072) | (0.264) |\n| Lack of F2F interactions | \u22120.288*** | \u22120.190** |\n| (0.068) | (0.089) |\n| Lack of work-life boundary | \u22120.148 | \u22120.172 |\n| (0.138) | (0.190) |\n| Lack of self-discipline | \u22120.013 | \u22120.180 |\n| (0.065) | (0.198) |\n| Taking care of family | \u22120.034 | \u22120.105 |\n| (0.069) | (0.190) |\n| Housework | \u22120.144*** | \u22120.160 |\n| (0.523) | (0.179) |\n| Constant | 3.921*** | \u22120.344 |\n| (0.523) | (1.751) |\n| Observations | 109 | 71 |\n| R-squared | 0.850 | 0.614 |\n\nRobust standard errors in brackets.\n\n* $p < 0.1$.\n\n** $p < 0.05$.\n\n*** $p < 0.01$.\n\n### Table 8\nStatus of stay-at-home orders by state.\n\n| State | Acronym | Order start date | Order end date |\n|----------------|---------|------------------|----------------|\n| Control group | | | |\n| Arkansas | AR | No statewide order | |\n| Iowa | IA | No statewide order | |\n| Nebraska | NE | No statewide order | |\n| North Dakota | ND | No statewide order | |\n| South Dakota | SD | No statewide order | |\n| Treatment group| | | |\n| Louisiana | LA | March 23, 2020 | May 15, 2020 |\n| Mississippi | MS | April 3, 2020 | May 11, 2020 |\n| Missouri | MO | April 6, 2020 | May 3, 2020 |\n| Montana | MT | March 28, 2020 | April 26, 2020 |\n| Tennessee | TN | March 31, 2020 | April 30, 2020 |\n| Wisconsin | WI | March 25, 2020 | May 26, 2020 |\nrequired residents not to leave home except for emergencies. These restrictions were often rigorously enforced and severely limited developers\u2019 opportunities for F2F interactions. On the other hand, the stay-at-home orders in the US were less strict, allowing residents to leave their homes for a broader range of activities such as shopping or work. This relatively lenient approach may have allowed US developers to adapt more easily to the new circumstances, leaving an insignificant impact on their work and lifestyles. Consequently, this may mitigate the negative impacts of the lockdown measures on their OSS contributions. Moreover, the less strict nature of the US orders may not have provided more available time at home for OSS contributions, as developers could still engage in many of their usual activities outside the home.\n\n6. Conclusion\n\nThe lockdowns induced by the COVID-19 pandemic have catalyzed a global shift towards WFH, demonstrating its feasibility on an unprecedented scale. While previous research has explored the broader implications of remote work, the nuanced dynamics between F2F and CMC in the context of work productivity remains an intricate and underexplored area. This complexity is particularly salient within technology-driven domains such as OSS development. Our study first leverages two lockdowns in China \u2013 Wuhan 2020 and Xi\u2019an 2021 \u2013 as natural experiments to study their causal impacts on developers\u2019 OSS contributions on GitHub. To improve the generalizability and relevance of our findings from Chinese lockdowns, we also further extend our analysis to encompass the impacts of stay-at-home orders implemented across different states of the US during the early stage of the pandemic.\n\nOur findings present a nuanced picture of the impact of lockdowns on developers\u2019 OSS contributions. We discovered that the Xi\u2019an lockdown in 2021 corresponded to a 9.0% increase in OSS contributions, while the Wuhan lockdown in 2020 saw a 10.5% reduction. This apparent contradiction is illuminated by our subsequent survey study, which reveals that the differing impacts can be mainly attributed to an adaptation effect related to the COVID-19 pandemic. More specifically, as the Xi\u2019an lockdown occurred nearly two years after Wuhan\u2019s, during which numerous city-level lockdowns had been implemented in China. This allowed developers to adapt to the new norm of WFH, optimizing the flexibility and opportunities provided by WFH to increase their OSS contributions. In stark contrast, the Wuhan lockdown, occurring at the onset of this pandemic when the virus was new, severe, and spreading rapidly, created a climate of fear and uncertainty. This atmosphere compounded by factors such as increased housework responsibilities, significantly impeded Wuhan developers\u2019 ability to focus on OSS contributions. However, these once influential factors became insignificant during the 2021 Xi\u2019an lockdown, highlighting the adaptability and resilience of individuals in the context of remote work during large-scale disruptions. Moreover, we found consistent patterns across both Wuhan and Xi\u2019an developers, where the lack of F2F interactions significantly reduced their OSS contributions, while increased available time at home positively influenced them. In addition to our study on China, we employed DID analysis to assess the generalizability of our findings by examining the impact of stay-at-home lockdown orders in the US on developers\u2019 OSS contributions. Interestingly, we found no significant impact of US lockdowns on these contributions. We posit that this may be due to the less strict nature of stay-at-home orders in the US, which may not have significantly disrupted developers\u2019 work and lifestyle, thereby exerting minimal effects on their OSS contributions.\n\nOur contributions are threefold. First, our findings contribute valuable insights into the effects of remote work on productivity, exploring how individuals adapt to remote work norms during prolonged disruptions such as the pandemic. These insights offer stakeholders, including individuals, organizations, and governments, the knowledge needed to prepare for future disruptions and foster sustainable resilience. Second, our findings shed light on the negative impact of reduced F2F interactions, thereby challenging the assumption that CMC can seamlessly\n\n### Table 9\n\n| | Before matching | After matching |\n|----------------------|-----------------|---------------|\n| | Mean | T-test | Mean | T-test |\n| | Treatment group | Control group | Difference | Treatment group | Control group | Difference |\n| Weeks | 237.195 | 251.904 | -14.710 *** | 237.195 | 232.121 | 5.074 |\n| Student | 0.177 | 0.160 | 0.016 | 0.177 | 0.184 | -0.008 |\n| Employee | 0.391 | 0.407 | -0.016 | 0.391 | 0.389 | 0.001 |\n| Contact | 1.000 | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 |\n| Number of projects | 18.946 | 19.269 | -0.323 | 18.946 | 18.906 | 0.039 |\n| Commits | 2072.852 | 2039.565 | 33.286 | 2072.852 | 1813.961 | 258.891 |\n| Stars received | 48.550 | 71.115 | -22.565 | 48.550 | 39.534 | 9.016 |\n| Issues received | 14.561 | 15.099 | -0.448 | 14.561 | 11.596 | 2.965 |\n| Comments received | 44.084 | 47.101 | -3.017 | 44.084 | 32.748 | 11.336 |\n| Stars sent out | 53.217 | 53.018 | 0.199 | 53.217 | 49.277 | 3.940 |\n| Issues sent out | 25.280 | 26.767 | -1.487 | 25.280 | 23.936 | 1.345 |\n| Comments sent out | 138.069 | 163.793 | -25.723 | 138.069 | 137.564 | 0.506 |\n| C | 0.019 | 0.025 | -0.006 | 0.019 | 0.021 | -0.001 |\n| C++ | 0.033 | 0.043 | -0.011 ** | 0.033 | 0.034 | -0.001 |\n| C# | 0.055 | 0.047 | 0.008 | 0.055 | 0.055 | 0.000 |\n| Go | 0.011 | 0.017 | -0.006 * | 0.011 | 0.009 | 0.002 |\n| Java | 0.101 | 0.115 | -0.014 * | 0.101 | 0.102 | -0.001 |\n| JavaScript | 0.211 | 0.188 | 0.023 ** | 0.211 | 0.216 | -0.006 |\n| PHP | 0.039 | 0.051 | -0.012 ** | 0.039 | 0.037 | 0.003 |\n| Python | 0.123 | 0.133 | -0.011 | 0.123 | 0.123 | 0.000 |\n| Ruby | 0.025 | 0.025 | -0.000 | 0.025 | 0.027 | -0.002 |\n| Scala | 0.002 | 0.004 | -0.002 | 0.002 | 0.002 | 0.000 |\n| TypeScript | 0.018 | 0.020 | -0.001 | 0.018 | 0.015 | 0.003 |\n| Collaborators | 1225.186 | 1203.774 | 21.412 | 1225.186 | 1114.715 | 110.471 |\n| Local collaborators | 1.377 | 2.207 | -0.830 *** | 1.377 | 1.377 | -0.000 |\n| Average age of projects | 89.021 | 95.269 | -6.249 *** | 89.021 | 87.204 | 1.817 |\n| Number of projects with GPL | 1.081 | 1.181 | -0.100 | 1.081 | 1.122 | -0.040 |\n\n*p < 0.1.*\n\n**p < 0.05.\n\n***p < 0.01.\nTable 10\nRegression results for the US lockdowns.\n\n| | (1) | (2) | (3) |\n|------------------|-----------|-----------|-----------|\n| ORDER<sub>i</sub> | 0.000 | -0.000 | 0.000 |\n| (0.007) | (0.007) | (0.007) | |\n| ORDER<sub>i</sub> \u00d7 LOCCOMS<sub>i</sub> | 0.001 | | |\n| (0.001) | | | |\n| ORDER<sub>i</sub> \u00d7 COMS<sub>i</sub> | | -0.000 | |\n| (0.000) | | (0.000) | |\n| REPO<sub>i</sub> | 0.221<sup>*</sup> | 0.221<sup>*</sup> | 0.221<sup>*</sup> |\n| (0.120) | (0.120) | (0.120) | |\n| TENURE<sub>i</sub> | 0.000<sup>*</sup> | 0.000<sup>*</sup> | 0.000<sup>*</sup> |\n| (0.000) | (0.000) | (0.000) | |\n| STARR<sub>i</sub> | 0.012<sup>*</sup> | 0.012<sup>*</sup> | 0.012<sup>*</sup> |\n| (0.007) | (0.007) | (0.007) | |\n| STARS<sub>i</sub> | 0.000 | 0.000 | 0.000 |\n| (0.002) | (0.002) | (0.002) | |\n| ISSUER<sub>i</sub> | 0.012 | 0.012 | 0.012 |\n| (0.029) | (0.029) | (0.029) | |\n| ISSUES<sub>i</sub> | 0.090<sup>***</sup> | 0.090<sup>***</sup> | 0.090<sup>***</sup> |\n| (0.028) | (0.028) | (0.028) | |\n| COMMENTR<sub>i</sub> | 0.036<sup>***</sup> | 0.036<sup>***</sup> | 0.036<sup>***</sup> |\n| (0.009) | (0.009) | (0.009) | |\n| COMMENTS<sub>i</sub> | 0.076<sup>***</sup> | 0.076<sup>***</sup> | 0.076<sup>***</sup> |\n| (0.017) | (0.017) | (0.017) | |\n| CASE<sub>i</sub> | -0.000 | -0.000 | -0.000 |\n| (0.000) | (0.000) | (0.000) | |\n| Constant | -0.580 | -0.580 | -0.580 |\n| (0.448) | (0.448) | (0.448) | |\n| Individual FE | Yes | Yes | Yes |\n| Time FE | Yes | Yes | Yes |\n| Observations | 222,138 | 222,138 | 222,138 |\n| R-squared | 0.049 | 0.049 | 0.049 |\n\nRobust standard errors in brackets.\n\n<sup>*</sup> p < 0.05.\n<sup>**</sup> p < 0.01.\n\nsubstitute for F2F interactions without any detrimental effects on productivity. This is especially pertinent in inherently digital domains such as OSS development. Our study adds a nuanced perspective to the broader discourse on the comparative impacts of CMC vs. F2F interactions on virtual team performance. This contribution is particularly important in today's environment, where the reliance on CMC due to the shift towards WFH has not only intensified but continues to shape the way we work and collaborate, even beyond the pandemic era (Airbnb, 2022; Warren, 2020). Third, unlike previous research that mainly relied on survey methods to investigate the impacts of lockdowns, our study embraced systematic causal analysis methods such as DID analysis. Using empirical data from GitHub, this rigorous approach, reinforced with various robustness tests and complemented by a survey study, established a multifaceted research framework. It opens new avenues for exploring the impact of policy interventions or organizational strategies in response to similar disruptions, thereby extending the applicability and relevance of our findings.\n\nMoreover, our findings may help open-innovation platforms and organizations that depend on collaborative contributions formulate WFH-related strategies or policies (Airbnb, 2022; Warren, 2020). First, these stakeholders may need to recognize that individuals' adaptation to WFH can vary significantly over time and across different contexts, and strategies must be tailored accordingly. For instance, many contextual factors analyzed in our survey should be accounted for, such as changes in work time, interruptions, flexibility, remote work technology conditions, and housework duties. Second, the absence of F2F interactions, a vital component of collaboration, requires the exploration of alternative methods to compensate for this drawback. For instance, platforms could invest in advanced collaboration tools designed to replicate or even enhance the interaction experience in a virtual environment, such as facial recognition systems that can identify and emphasize micro-expression or emotional cues. Third, the positive impact of increased home time highlights the importance of flexible work policies. These policies should enable individuals to capitalize on the benefits of remote work without sacrificing productivity. At last, the initial negative impact of fear suggests that emotional support and well-being should be essential to remote work-related policies or strategies, especially during unprecedented disruptions like the COVID-19 pandemic.\n\nSome limitations of our study generate directions and opportunities for future research. For instance, although it is reassuring that our study leverages two citywide lockdowns in China and statewide stay-at-home orders in the US, the contrasting findings between them highlight the complexity of remote work and suggest a need for further research to further understand the generalizability of our findings across different cultures, industries, and types of work. Second, our study focuses on OSS contributions measured by the number of commits. Future research needs to consider other measures of innovation-related work productivity such as code quality or creativity.\n\nCRediT authorship contribution statement\n\nJin Hu: Conceptualization, Methodology, Software, Formal analysis, Investigation, Data curation, Visualization, Writing - original draft, Writing - review & editing. Daning Hu: Conceptualization, Methodology, Supervision, Project administration, Funding acquisition, Writing - original draft, Writing - review & editing. Xuan Yang: Investigation, Funding acquisition, Writing - review & editing. Michael Chau: Supervision, Writing - review & editing.\n\nDeclaration of competing interest\n\nThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.\n\nData availability\n\nData will be made available on request.\n\nAcknowledgement\n\nThe authors gratefully acknowledge funding from Guangdong Province Focus Research Project (Grant Number: 2019KZDZX1014), Guangdong Province Research Fund (Grant Number: 2019QN01X277), National Natural Science Foundation of China (Grant Numbers: 71971106, 72001099), and Shenzhen Humanities & Social Sciences Key Research Bases.\n\nAppendix A. Questionnaires for the survey analysis\n\nThe questionnaire for the Wuhan developers includes the following six questions.\n1. Please indicate your choice on the following statements based on your experience before January 23, 2020 (i.e., the day of Wuhan lockdown). (1 = Strongly disagree, 2 = Disagree, 3 = Neutral, 4 = Agree, 5 = Strongly agree).\n1.1 (OnlineFrequency) I often made online comments to GitHub developers in the same city with me (hereinafter referred to as local collaborators) on GitHub. ___\n1.2 (OnlinePreference) I enjoyed making online comments to my local collaborators on GitHub. ___\n1.3 (OnlineNeed) My project tasks required me to make online comments to my local collaborators on GitHub. ___\n1.4 (OfflineFrequency) I often interacted with my local collaborators offline. ___\n1.5 (OfflinePreference) I enjoyed interacting with my local collaborators offline. ___\n1.6 (OfflineNeed) My project tasks required me to interact with my local collaborators offline. ___\n\nPlease answer Questions 2\u20135 based on your lockdown experience during the five weeks after January 23, 2020, compared to the five weeks before that day.\n\n2. Did the lockdown give you more time available for making OSS contributions on GitHub?\n\n| Option | Code |\n|---------------------------------|------|\n| Gave me much less time | 1 |\n| Gave me less time | 2 |\n| Neutral: same as before lockdown| 3 |\n| Gave me more time | 4 |\n| Gave me much more time | 5 |\n\n3. Did you have more interruptions when making OSS contributions on GitHub?\n\n| Option | Code |\n|---------------------------------|------|\n| Much fewer interruptions | 1 |\n| Fewer interruptions | 2 |\n| Neutral: same as before lockdown| 3 |\n| More interruptions | 4 |\n| Much more interruptions | 5 |\n\n4. Did you have more flexibility for making OSS contributions on GitHub?\n\n| Option | Code |\n|---------------------------------|------|\n| Much less flexible | 1 |\n| Less flexible | 2 |\n| Neutral: same as before lockdown| 3 |\n| More flexible | 4 |\n| Much more flexible | 5 |\n\n5. How was your work environment (e.g., internet bandwidth and hardware) at home for making OSS contributions on GitHub?\n\n| Option | Code |\n|---------------------------------|------|\n| Much worse work environment | 1 |\n| Worse work environment | 2 |\n| Neutral: same as before lockdown| 3 |\n| Better work environment | 4 |\n| Much better work environment | 5 |\n\n6. How would you rate each of the following factors in their respective impacts on your contributions to GitHub during the five weeks after January 23, 2020, compared to the five weeks before that day? (1 = Very low impact, 2 = Low impact, 3 = Neutral, 4 = High impact, 5 = Very high impact).\n\n| Factor | Code |\n|---------------------------------------------|------|\n| 6.1 Fear related to the COVID-19 pandemic | |\n| 6.2 Lack of face-to-face interactions with my collaborators | |\n| 6.3 Lack of work-life boundary | |\n| 6.4 Lack of self-discipline | |\n| 6.5 Taking care of my family | |\n| 6.6 Doing housework | |\n\nThe questionnaire for the Xi\u2019an developers is the same as that for the Wuhan developers except for the following changes:\n\n1. \u2026 before December 23, 2021 (i.e., the day of Xi\u2019an lockdown). \u2026\n\nPlease answer Questions 2\u20135 based on your lockdown experience during the four weeks after December 23, 2021, compared to the four weeks before that day. \u2026\n\n6. \u2026 during the four weeks after December 23, 2021, compared to the four weeks before that day. \u2026\nTable A1\nCorrelation test results for the first survey question.\n\n| Correlation between | Wuhan developers (1) | Xi\u2019an developers (2) |\n|---------------------|----------------------|----------------------|\n| OnlineFrequency & OfflineFrequency | 0.305 *** | 0.253 ** |\n| OnlinePreference & OfflinePreference | 0.423 *** | 0.283 ** |\n| OnlineNeed & OfflineNeed | 0.442 *** | 0.540 *** |\n\n*p < 0.1.\n**p < 0.05.\n***p < 0.01.\n\nAppendix B. Robustness checks for the US lockdowns\n\nTo test the parallel trends assumption for the US lockdowns, we adopt an event-study approach by fitting the following equation:\n\n\\[\n\\text{CONTRIBUTION}_{it} = \\alpha + \\sum_{k=n, k \\neq 1}^{n} \\beta_k T_{itk} + \\gamma CV_{it} + \\mu_i + \\theta_t + \\epsilon_{it} \\tag{B1}\n\\]\n\nwhere \\( n \\) equals 28. \\( T_{itk} \\) represents a series of dummies that indicate the chronological distance between the observation and the actual date when the state where developer \\( i \\) resides implemented stay-at-home orders. \\( k = 1 \\) designates the date immediately preceding the treatment, and thus it is omitted from the equation, serving as the reference date.\n\nFig. B1 shows the estimated coefficients \\( \\beta_k \\) from Eq. (B1). The green vertical line represents the day when the stay-at-home order was enacted. The accompanying gray dotted lines delineate the 95% confidence intervals for each coefficient. Notably, the estimated \\( \\beta_k \\) values for \\( k < 0 \\) are virtually zero, indicating that there is no significant pre-treatment difference in the contribution trends between the treatment and control groups. Therefore, our DID analysis satisfies the parallel trends assumption, reinforcing the validity of our DID analysis for the US lockdowns.\n\n\n\nWe also perform another robustness check by re-estimating Eqs. (5)\u2013(7) using an alternative matched sample. This is achieved by incorporating a caliper of 0.1 in the PSM procedure, resulting in a matched sample that includes 2568 pairs of developers across both the treatment and control groups. The summary of \\( t \\)-test results, presented in Table B1, reveals no statistically significant differences between the treatment and control groups at the 10% significance level. This outcome substantiates the comparability of the two groups following the matching process. Table B2 summarizes the results of Eqs. (5)\u2013(7) derived from the alternative matched sample. The coefficients of \\( \\text{ORDER}_{it} \\), \\( \\text{ORDER}_{it} \\times \\text{LOCCOMS}_i \\), and \\( \\text{ORDER}_{it} \\times \\text{COMS}_i \\) are all found to be statistically insignificant. This outcome implies that the implementation of stay-at-home orders in the US does not have a significant influence on developers\u2019 OSS contributions.\n\nTable B1\n\n| | Before matching | After matching |\n|----------------------|-----------------|----------------|\n| | Treatment group | Control group | Difference | Treatment group | Control group | Difference |\n| **Weeks** | 237.195 | 251.904 | -14.710 ***| 236.515 | 232.990 | 3.525 |\n| **Student** | 0.177 | 0.160 | 0.016 | 0.177 | 0.181 | -0.004 |\n| **Employee** | 0.391 | 0.407 | -0.016 | 0.390 | 0.391 | -0.002 |\n| **Contact** | 1.000 | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 |\n| **Number of projects** | 18.946 | 19.269 | -0.323 | 18.651 | 18.921 | -0.270 |\n| **Commits** | 2072.852 | 2039.565 | 33.286 | 2058.688 | 1822.745 | 235.943 |\n| **Stars received** | 48.550 | 71.115 | -22.565 | 48.081 | 39.739 | 8.342 |\n| **Issues received** | 14.561 | 15.009 | -0.448 | 13.848 | 11.662 | 2.186 |\n\n(continued on next page)\nTable B1 (continued)\n\n| Before matching | After matching |\n|-----------------|----------------|\n| | Treatment group | Control group | Difference | Treatment group | Control group | Difference |\n| Comments received | 44.084 | 47.101 | -3.017 | 42.707 | 32.934 | 9.773 |\n| Stars sent out | 53.217 | 53.018 | 0.199 | 45.862 | 49.546 | -3.684 |\n| Issues sent out | 25.280 | 26.767 | -1.487 | 23.762 | 24.065 | -0.303 |\n| Comments sent out| 138.069 | 163.793 | -25.723 | 133.803 | 138.307 | -4.504 |\n| C | 0.019 | 0.025 | -0.006 | 0.019 | 0.021 | -0.001 |\n| C++ | 0.033 | 0.043 | -0.011 ** | 0.033 | 0.034 | -0.001 |\n| C# | 0.055 | 0.047 | 0.008 | 0.055 | 0.055 | 0.001 |\n| Go | 0.011 | 0.017 | -0.006 * | 0.011 | 0.009 | 0.002 |\n| Java | 0.101 | 0.115 | -0.014 * | 0.102 | 0.103 | -0.001 |\n| JavaScript | 0.211 | 0.188 | 0.023 ** | 0.208 | 0.217 | -0.009 |\n| PHP | 0.039 | 0.051 | -0.012 ** | 0.040 | 0.037 | 0.003 |\n| Python | 0.123 | 0.133 | -0.011 | 0.123 | 0.124 | -0.001 |\n| Ruby | 0.025 | 0.025 | -0.000 | 0.025 | 0.027 | -0.002 |\n| Scala | 0.002 | 0.004 | -0.002 | 0.002 | 0.002 | 0.000 |\n| TypeScript | 0.018 | 0.020 | -0.001 | 0.018 | 0.015 | 0.003 |\n| collaborators | 1225.186 | 1203.774 | 21.412 | 1074.696 | 1118.728 | -44.032 |\n| Local collaborators | 1.377 | 2.207 | -0.830 *** | 1.354 | 1.384 | -0.030 |\n| Average age of projects | 89.021 | 95.269 | -6.249 *** | 88.861 | 87.490 | 1.370 |\n| Number of projects with GPL | 1.081 | 1.181 | -0.100 | 1.075 | 1.127 | -0.052 |\n\n* p < 0.1.\n** p < 0.05.\n*** p < 0.01.\n\nTable B2\nRegression results for alternative sample for the US lockdowns.\n\n| Dependent variable: CONTRIBUTION<sub>i</sub> | (1) | (2) | (3) |\n|---------------------------------------------|-----|-----|-----|\n| ORDER<sub>i</sub> | 0.000 | 0.000 | 0.000 |\n| (0.007) | (0.007) | (0.007) |\n| ORDER<sub>i</sub> \u00d7 LOCCOMS<sub>i</sub> | 0.001 | 0.001 | 0.001 |\n| (0.001) | (0.001) | (0.001) |\n| ORDER<sub>i</sub> \u00d7 COMS<sub>i</sub> | 0.000 | 0.000 | 0.000 |\n| (0.000) | (0.000) | (0.000) |\n| REPO<sub>i</sub> | 0.225 * | 0.225 * | 0.225 * |\n| (0.120) | (0.120) | (0.120) |\n| TENURE<sub>i</sub> | 0.000 * | 0.000 * | 0.000 * |\n| (0.000) | (0.000) | (0.000) |\n| STARR<sub>i</sub> | 0.012 * | 0.012 * | 0.012 * |\n| (0.007) | (0.007) | (0.007) |\n| STARS<sub>i</sub> | -0.006 ** | -0.006 ** | -0.006 ** |\n| (0.003) | (0.003) | (0.003) |\n| ISSUER<sub>i</sub> | 0.012 | 0.012 | 0.012 |\n| (0.029) | (0.029) | (0.029) |\n| ISSUES<sub>i</sub> | 0.089 *** | 0.089 *** | 0.089 *** |\n| (0.028) | (0.028) | (0.028) |\n| COMMENTR<sub>i</sub> | 0.036 *** | 0.036 *** | 0.036 *** |\n| (0.009) | (0.009) | (0.009) |\n| COMMENTS<sub>i</sub> | 0.075 *** | 0.075 *** | 0.075 *** |\n| (0.017) | (0.017) | (0.017) |\n| CASE<sub>i</sub> | -0.000 | -0.000 | -0.000 |\n| (0.000) | (0.000) | (0.000) |\n| Constant | 0.593 | 0.592 | 0.593 |\n| (0.451) | (0.451) | (0.451) |\n| Individual FE | Yes | Yes | Yes |\n| Time FE | Yes | Yes | Yes |\n| Observations | 220,848 | 220,848 | 220,848 |\n| R-squared | 0.049 | 0.049 | 0.049 |\n\nRobust standard errors in brackets.\n\n* p < 0.1.\n** p < 0.05.\n*** p < 0.01.\n\nReferences\n\nAirbnb, 2022. Airbnb's Design for Employees to Live and Work Anywhere. https://news.airbnb.com/airbnbs-design-to-live-and-work-anywhere/.\n\nAsay, M., 2020. COVID-19 Isn't Slowing Open Source\u2014Watch for Developer Burnout. https://www.techrepublic.com/article/covid-19-isnt-slowing-open-source-watch-for-developer-burnout/.\n\nBao, L., Li, T., Xia, X., Zhu, K., Li, H., Yang, X., 2022. How does working from home affect developer productivity? \u2013 a case study of Baidu during COVID-19 pandemic. SCIENCE CHINA Inf. Sci. 65, 1\u201315.\nBarber, B.M., Jiang, W., Morse, A., Puri, M., Tookes, H., Werner, I.M., 2021. What explains differences in finance research productivity during the pandemic? J. Finance 76, 1655\u20131699.\n\nBergel, B.J., Bergel, E.B., Balsmeier, P.W., 2006. The reality of virtual teams. Competition Forum 4, 427\u2013432.\n\nBoden, D., Molotch, H.L., 1994. The compulsion of proximity. In: Friedland, R., Boden, D. (Eds.), Newcomers, Time and Modernity. University of California Press, Berkeley, pp. 257\u2013286.\n\nBrandzert, P.B., Nov, O., 2011. Facebook use and social capital\u2014a longitudinal study. In: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media. Barcelona, Spain, pp. 454\u2013457.\n\nBrucks, M.S., Levav, J., 2022. Virtual communication curbs creative idea generation. Nature 605, 108\u2013112.\n\nButler, J., Jaffe, S., 2021. Challenges and gratitude: a diary study of software engineers working from home during COVID-19 pandemic. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice. IEEE, Madrid, Spain, pp. 362\u2013363.\n\nChen, S., Ma, H., Wu, Q., 2019. Bank credit and trade credit: evidence from natural experiments. J. Bank. Financ. 108, 105616.\n\nChen, J., Chen, W., Liu, E., Luo, J., Song, Z.M., 2022a. The Economic Cost of Lockdown in China: Evidence from City-to-city Truck Flows.\n\nChen, X., Guo, M., Shangguan, W., 2022b. Estimating the impact of cloud computing on firm performance: an empirical investigation of listed firms. Inf. Manag. 59, 103603.\n\nCochran, W.G., Rubin, D.B., 1973. Controlling bias in observational studies: a review. Sankhya: Indian J. Stat. Ser. A 35, 417\u2013446.\n\nColombo, G., 2020. Open Source and COVID-19: Open Source Will Come Out Stronger on the Other Side of the Pandemic. https://www.finos.org/blog/open-source-and-covid-19-open-source-will-come-out-stronger-on-the-other-side-of-the-pandemic.\n\nCranton, C.D., 2001. The mutual knowledge problem and its consequences for dispersed collaboration. Organ. Sci. 12, 346\u2013371.\n\nCranton, C.D., Webber, S.S., 2005. Relationships among geographic dispersion, team processes, and effectiveness in software development work teams. J. Bus. Res. 58, 755\u2013765.\n\nCrowston, K., Howison, J., Masango, C., Ereyel, U.Y., 2007. The role of face-to-face meetings in technology-supported self-organizing distributed teams. IEEE Trans. Prof. Commun. 50, 185\u2013203.\n\nCui, R., Ding, H., Zhu, F., 2022. Gender inequality in research productivity during the COVID-19 pandemic. Manuf. Serv. Oper. Manag. 24, 707\u2013726.\n\nDaft, R.L., Lengel, R.H., 1986. Organizational information requirements, media richness and structural design. Manag. Sci. 32, 554\u2013571.\n\nDaniel, S., Stewart, K., 2016. Open source project success: resource access, flow, and integration. J. Strateg. Inf. 25, 159\u2013176.\n\nDavidson, J., Mannan, U., Naik, R., Dua, J., Jensen, C., 2014. Older adults and free/open source software: a diary study of first-time contributors. In: Proceedings of the International Symposium on Open Collaboration. Association for Computing Machinery, New York, NY, United States, pp. 1\u201310.\n\nDennis, A.R., Fuller, R.M., Valacich, J.S., 2008. Media, tasks, and communication processes: a theory of media synchronicity. MIS Q. 32, 575\u2013600.\n\nDiMaggio, P., Hargittai, E., Neuman, W.R., Robinson, J.P., 2001. Social implications of the internet. Annu. Rev. Sociol. 27, 307\u2013336.\n\nFang, Y., Neufeld, D., 2009. Understanding sustained participation in open source software projects. J. Manag. Inf. Syst. 25, 9\u201350.\n\nFang, H., Wang, L., Voss, J., 2010. Human mobility restrictions and the spread of the novel coronavirus (2019-nCoV) in China. J. Public Econ. 191, 104272.\n\nFoerderer, J., 2020. Interfirm exchange and innovation in platform ecosystems: Evidence from Apple\u2019s worldwide developers conference. Manag. Sci. 66, 4772\u20134778.\n\nFord, D., Storey, M.-A., Zimmermann, T., Bird, C., Jaffe, S., Maddila, C., Butler, J.L., Houck, B., Nagappan, N., 2021. A tale of two cities: software developers working from home during the COVID-19 pandemic. ACM Trans. Softw. Eng. Methodol. 31, 1\u201337.\n\nForsgren, N., 2020. Octoverse Spotlight: An Analysis of Developer Productivity, Work Cadence, and Collaboration in the Early Days of COVID-19. https://github.com/blog/2020-05-06-octoverse-spotlight-an-analysis-of-developer-productivity-work-cadence-a-n-colaboration-in-the-early-days-of-covid-19.\n\nFoss, N.J., Jeppesen, L.B., Rullani, F., 2021. How context and attention shape behaviors in online communities: a modified garbage can model. Ind. Corp. Chang. 30, 1\u201318.\n\nGitHub, 2022a. GitHub Language Support. https://docs.github.com/en/get-started/learning-about-github/github-language-support.\n\nGitHub, 2022b. How GitHub Builds Software. https://github.com/about. Hambrick, D.C., Davison, S.C., Snell, S.A., Snow, C.C., 1998. When groups consist of multiple nationalities: towards a new understanding of the implications. Organ. Stud. 19, 181\u2013205.\n\nHertel, G., Niederer, S., Hermann, S., 2003. Motivation of software developers in open source projects: an internet-based survey of contributors to the linux kernel. Res. Policy 32, 1159\u20131177.\n\nHoward, P.E., Rainie, L., Jones, S., 2001. Days and nights on the internet: the impact of a major technology on South Africa. Res. Sci. 45, 383\u2013404.\n\nHu, J., Hu, D., Yang, X., Chau, M., 2023. Can firms improve performance through external contributions to their open-source software projects?. In: Proceedings of the 31th European Conference on Information Systems (ECIS), Kristiansand, Norway.\n\nHuang, L., Zhong, D., Fan, W., 2022. Do social networking sites promote life satisfaction? The explanation from an online and offline social capital transformation. Inf. Technol. People 35, 703\u2013722.\n\nKhaliq, A., Mikami, A.Y., 2018. Talking face-to-face: associations between online and offline interactions of online relationships. Comput. Hum. Behav. 89, 88\u201397.\n\nKock, N., 2004. The psychobiological model: towards a new theory of computer-mediated communication based on Darwinian evolution. Organ. Sci. 15, 327\u2013348.\n\nKraut, R.E., Lewis, S.H., Swezey, L.W., 1982. Listener responsiveness and the coordination of conversation. J. Pers. Soc. Psychol. 43, 718\u2013731.\n\nKraut, R., Kiesler, S., Boneva, B., Cummings, J., Helgeson, V., Crawford, A., 2002. Internet paradox revisited. J. Soc. Issues 58, 49\u201374.\n\nvon Krogh, G., Haefliger, S., 2012. Carrots and rainbows: motivation and social practice in open source software development. MIS Q. 36, 649\u2013674.\n\nLau, D.C., Murmigan, J.K., 1998. Demographic diversity and fruitfulness: the compositional dynamics of organizational groups. Acad. Manag. Rev. 23, 325\u2013340.\n\nLeslie, E., Wilson, R., 2020. Sheltering in place and domestic violence: evidence from calls for service during COVID-19. J. Public Econ. 189, 104241.\n\nLipnitzki, J., Stamps, J., 1999. Virtual teams: the new way to work. Strateg. Leader. 27, 14\u201319.\n\nMiller, C., Widder, D.G., Kastner, C., Vasilieus, B., 2019. Why do people give up flossing? A study of contributor disengagement in open source. In: IFIP International Conference on Open Source Systems. Springer, pp. 116\u2013129.\n\nMiller, C., Rodeghero, P., Storey, M.-A., Ford, D., Zimmermann, T., 2021. \u201cHow was your weekend?\u201d Software development teams working from home during COVID-19. In: IEEE/ACM 43rd International Conference on Software Engineering. IEEE, pp. 624\u2013636.\n\nMoqi, M., Mei, X., Qiu, L., Bandypadhyay, S., 2018. Effect of \u201cfollowing\u201d on contributions to open source communities. J. Manag. Inf. Syst. 35, 1188\u20131217.\n\nMuralidharan, K., Prakash, N., 2017. Cycling to school: increasing secondary school enrollment for girls in India. Am. Econ. J. Appl. Econ. 9, 321\u2013350.\n\nNegoita, B., Vial, G., Shaikh, M., Labbe, A., 2019. Code forking and software development project sustainability. In: Evidence from GitHub, Fortieth International Conference on Information Systems, Munich, Germany.\n\nNeto, P.A.d.M.S., Mannan, U.A., de Almeida, E.S., Nagappan, N., Lo, D., Singh, Kochhar, P., Gao, C., Ahmed, I., 2021. A deep dive into the impact of COVID-19 on software development. IEEE Trans. Softw. Eng. 48, 3342\u20133360.\n\nNicCanna, C., Razzak, M.A., Noll, J., Beecham, S., 2021. Globally distributed development during COVID-19. In: 2021 IEEE/ACM 40th International Workshop on Software Engineering Research and Industrial Practice. IEEE, pp. 18\u201325. Virtual Conference.\n\nOcker, R., Fjermedal, J., Hiltz, S.R., Johnson, K., 1998. Effects of four modes of group communication on the outcomes of software requirements determination. J. Manag. Inf. Syst. 15, 99\u2013118.\n\nO\u2019Mahony, S., Ferraro, F., 2007. The emergence of governance in an open source community. Acad. Manag. J. 50, 1079\u20131106.\n\nPeters, P., Baltes, S., Adisaputri, G., Torkar, R., Kovalenko, V., Kalinowski, M., Novielli, N., Yoo, S., Devroye, X., Tan, X., Zhou, M., Turhan, B., Hoda, R., Hata, H., Robles, G., Fard, A.M., Alkadhri, R., 2020. Pandemic programming. Emir. Softw. Eng. 25, 1\u201335.\n\nShah, S.K., 2006. Motivation, governance, and the viability of hybrid forms in open source software development. Manag. Sci. 52, 1000\u20131014.\n\nSheridan, A., Andersen, A.L., Hansen, E.T., Johannesen, N., 2020. Social distancing laws cause only small losses of economic activity during the COVID-19 pandemic in Scandinavia. Proc. Natl. Acad. Sci. 117, 20468.\n\nSmite, D., Moe, N.B., Klotins, E., Gonzalez-Huerta, J., 2023. From forced working-from-home to voluntary working-from-anywhere: two revolutions in telework. J. Syst. Softw. 195, 111509.\n\nSproll, L., Kiesler, S., 1986. Reducing social context cues: electronic mail in organizational communication. Manag. Sci. 32, 1492\u20131512.\n\nStam, W., 2009. When does community participation enhance the performance of open source software projects? 1287\u20131299.\n\nStraus, S.G., McGrath, J.E., 1994. Does the medium matter? The interaction of task type and technology on group performance and member reactions. J. Appl. Psychol. 79, 397\u2013405.\n\nSuphan, A., Mierzejewska, B.L., 2016. Boundaries between online and offline realms: how social grooming affects students in the USA and Germany. Inf. Commun. Soc. 19, 1287\u20131305.\n\nSuphan, A., Feuls, M., Fieseler, C., 2012. Social media\u2019s potential in improving the mental wellbeing of the unemployed. In: Eriksson-Bakka, K., Looma, A., Krook, E. (Eds.), Exploring the Abyss of Inequalities. Springer, Berlin, pp. 10\u201328.\n\nTanaka, T., Okamoto, S., 2021. Increase in suicide following an initial decline during the COVID-19 pandemic in Japan. Nat. Hum. Behav. 5, 229\u2013238.\n\nToussaint, A.M., DeMarie, S.M., Hendrickson, A.R., 1998. Virtual teams: the workplace of the future. Acad. Manag. Perspect. 12, 17\u201329.\n\nWakefield, R.L., Leidner, D.E., Garrison, G., 2008. Research note\u2014a model of conflict, leadership, and performance in virtual teams. Inf. Syst. Res. 19, 434\u2013455.\n\nWalters, C., Mehl, G.G., Piraino, P., Jansen, J.D., Kriger, S., 2022. The impact of the pandemic-enforced lockdown on the scholarly productivity of women academics in South Africa. Res. Policy 51, 104403.\n\nWang, G., 2022. Stay at home to save: effectiveness of stay-at-home orders in containing the COVID-19 pandemic. Proc. Oper. Manag. 31, 2289\u20132305.\n\nWang, Y., Cai, H., Li, C., Jiang, Z., Wang, L., Song, J., Xia, J., 2013. Optimal caliper width for propensity score matching of three treatment groups: a Monte Carlo study. PLoS One 8, e101405.\n\nWarren, T., 2020. Microsoft Is Letting More Employees Work From Home Permanently. https://www.theverge.com/2020/10/9/21508964/microsoft-remote-work-from-home-microsoft-2019?fbclid=IwAR08H1r0lBjymHbfw4fYApVhHcdRvK5tv5z2qYTaUYe6c8Q6ynMkXzQxQ4.\n\nWalters, C., Mehl, G.G., Piraino, P., Jansen, J.D., Kriger, S., 2022. The impact of the pandemic-enforced lockdown on the scholarly productivity of women academics in South Africa. Res. Policy 51, 104403.\nWellman, B., Salaff, J., Dimitrova, D., Garton, L., Gulia, M., Haythornthwaite, C., 1996. Computer networks as social networks: collaborative work, telework, and virtual community. Annu. Rev. Sociol. 22, 213\u2013238.\n\nWikipedia, 2022. Han Chinese. https://en.wikipedia.org/wiki/Han_Chinese.\n\nWu, J., Smith, S., Khurana, M., Siemaszko, C., DeJesus-Banos, B., 2020. Stay-at-home Orders Across the Country. https://www.nbcnews.com/health/health-news/here-are-stay-home-orders-across-country-n1168736.\n\nXu, B., Jones, D.R., Shao, B., 2009. Volunteers\u2019 involvement in online community based software development. Inf. Manag. 46, 151\u2013158.\n\nYang, X., Li, X., Hu, D., Wang, H.J., 2021. Differential impacts of social influence on initial and sustained participation in open source software projects. J. Assoc. Inf. Sci. Technol. 72, 1133\u20131147.\n\nZhang, X.M., Zhu, F., 2011. Group size and incentives to contribute: a natural experiment at Chinese Wikipedia. Am. Econ. Rev. 101, 1601\u20131615.", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/010-hu.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 19, "total-input-tokens": 71273, "total-output-tokens": 36216, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 4357, 1], [4357, 12904, 2], [12904, 21405, 3], [21405, 29563, 4], [29563, 36197, 5], [36197, 40597, 6], [40597, 49610, 7], [49610, 57928, 8], [57928, 60768, 9], [60768, 68722, 10], [68722, 77160, 11], [77160, 85708, 12], [85708, 93870, 13], [93870, 100802, 14], [100802, 104363, 15], [104363, 108419, 16], [108419, 113835, 17], [113835, 125602, 18], [125602, 126577, 19]]}}
|
|
{"id": "23c1966dc531032962897bc3e0f635d6c99ae1c0", "text": "Beyond Dependencies: The Role of Copy-Based Reuse in Open Source Software Development\n\nMAHMOUD JAHANSHAHI, DAVID REID, and AUDRIS MOCKUS, University of Tennessee, USA\n\nIn Open Source Software, resources of any project are open for reuse by introducing dependencies or copying the resource itself. In contrast to dependency-based reuse, the infrastructure to systematically support copy-based reuse appears to be entirely missing. Our aim is to enable future research and tool development to increase efficiency and reduce the risks of copy-based reuse. We seek a better understanding of such reuse by measuring its prevalence and identifying factors affecting the propensity to reuse. To identify reused artifacts and trace their origins, our method exploits World of Code infrastructure. We begin with a set of theory-derived factors related to the propensity to reuse, sample instances of different reuse types, and survey developers to better understand their intentions. Our results indicate that copy-based reuse is common, with many developers being aware of it when writing code. The propensity for a file to be reused varies greatly among languages and between source code and binary files, consistently decreasing over time. Files introduced by popular projects are more likely to be reused, but at least half of reused resources originate from \u201csmall\u201d and \u201cmedium\u201d projects. Developers had various reasons for reuse but were generally positive about using a package manager.\n\nCCS Concepts: \u2022 Software and its engineering \u2192 Software creation and management; \u2022 General and reference \u2192 Empirical studies.\n\nAdditional Key Words and Phrases: Reuse, Open Source Software, Software Development, Copy-based Reuse, Software Supply Chain, World of Code\n\n1 INTRODUCTION\n\nSoftware reuse refers to the practice of developing software systems from existing software rather than creating them from scratch [55]. Starting from scratch may demand more time and effort than reusing pre-existing, high-quality code that fits the required task. Developers, therefore, opportunistically and frequently reuse code [48]. Programming for clearly defined problems often starts with a search in code repositories, typically followed by careful copying and pasting of the relevant code [85].\n\nThe fundamental principle of Open Source Software (OSS) lies in its \u201copenness\u201d, which enables anyone to access, inspect, and reuse any artifact of a project. This could significantly enhance the efficiency of the software development process. Platforms such as GitHub increase reuse opportunities by enabling the community of developers to curate software projects and by promoting and improving the process of opportunistic discovery and reuse of artifacts [46]. A significant portion of OSS is intentionally built to be reused, offering resources or functionality to other software projects [39], thus such reuse can be categorized as one of the building blocks of OSS. Indeed, developers in the open source community not only seek opportunities to reuse existing high-quality code, but also actively promote their own well-crafted artifacts for others to utilize [33]. Being widely reused\n\nAuthors\u2019 address: Mahmoud Jahanshahi, mjahansh@vols.utk.edu; David Reid, dreid6@vols.utk.edu; Audris Mockus, audris@utk.edu, Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA.\n\nPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.\n\n\u00a9 2025 Copyright held by the owner/author(s).\nACM 1557-7392/2025/1-ART\nhttps://doi.org/10.1145/3715907\n\nACM Trans. Softw. Eng. Methodol.\nnot only increases the popularity of the software project and its maintainers while providing them with job prospects [79], but also may bring new maintainers as well as corporate support [46].\n\nMost commonly, code reuse refers to the introduction of explicit dependencies on the functionality provided by ready-made packages, libraries, frameworks, or platforms maintained by other projects (referred to as dependency-based or black-box reuse). Such external code is not modified by the developer and, generally, not committed into the project\u2019s repository but relied upon via a package manager. Copy-based reuse (or white-box reuse), on the other hand, refers to the case where source code (or other reusable artifacts) is reused by copying the original code and committing the duplicate code into a new repository. It may remain the same or be modified by the developer after reuse. We specifically focus on copy-based reuse in this study.\n\nWhile it is generally accepted that programs should be modular [75], with internal implementation details not exposed outside the module, copy-based reuse does exactly the opposite. OSS\u2019s copy-based reuse, where any source code file or even a code snippet can be reused in another project, may result in multiple, possibly modified instances of the same source code replicated across various files and repositories. These copies may undergo further changes during maintenance, leading to multiple different versions of the originally identical code existing in the latest releases of corresponding projects. Unifying such multiplicity of versions in copy-based reuse to refactor it into a single package that all these projects could depend upon may not always be a tractable problem.\n\nMoreover, as this reuse process continues across various projects, possibly with some modifications, data related to the initial design, authorship, copyright status, and licensing could be lost [76]. This loss could impede future enhancements and bug-fixing efforts. It might also diminish the motivation for original authors who seek recognition for their work and lead to legal complications for downstream users. These issues impact not only those who reuse the code but also the software dependent on at least one package that involves reused code [20].\n\nAs the landscape of Open Source Software (OSS) expands, tracing the origins of source code, identifying high-quality code suitable for reuse, and deciphering the simultaneous progression of code across numerous projects become increasingly challenging. This can pose risks, such as the spread of potentially low-quality or vulnerable code [46] (e.g., orphan vulnerabilities [78]).\n\nDespite the sustained attention and potential benefits and risks associated with reuse, the exact scale, prevalent practices, and possible negative impacts related to OSS-wide reuse have not been thoroughly explored. This is primarily due to the formidable task of tracking code throughout the entirety of OSS [46].\n\nGaining a more comprehensive understanding of reuse practices could guide future research towards developing methods or tools that enhance productivity while mitigating the inherent risks associated with reuse. Specifically, we aim to quantify several aspects concerning the extent and nature of reuse in OSS, providing information necessary to investigate approaches that support this common activity, making it more efficient and safer.\n\nWe use a measurement framework created by Jahanshahi and Mockus [46] that tracks all versions of project artifacts, referred to as blobs\\(^1\\), across all repositories. In this approach, the first time each blob is committed to a repository is identified. The (repository, blob) tuples are then sorted based on the commit time of the first appearance of that unique blob in the repository. The repository with the earliest commit time is identified as the originating repository, and the person who made that commit is recognized as the creator of the blob. Reuse instances are then identified by pairing the originating repository with any subsequent repositories that commit the same blob.\n\nOur work investigates how much and what kind of the whole-file reuse happens at the scale of OSS, with findings that could help guide future research and tool development to support this common but potentially risky activity. First, we show how the existing studies, by ignoring \u201csmall\u201d and inactive projects, miss almost half of the code reused even by the \u201clargest\u201d and most active projects. There is a necessity for more in-depth study to\n\n\\(^1\\)In alignment with the terminology used in the Git version control system, we use the term \u201cblob\u201d to refer to a single version of a file.\nfully comprehend how these abundant yet unseen \u201cdark matter\u201d projects contribute to reuse activity. Second, we theorize about and investigate empirically the properties of artifacts and originating projects that influence the likelihood of file reuse, addressing a key question that previous work, which has predominantly focused on copy detection techniques, has missed. To investigate historic reuse trends, we also introduce a time-limited measure of reuse. Our findings reveal several surprising patterns showing how copying varies with the programming language, properties of a blob, and originating projects. These insights could help prioritize and articulate further research and tool development that supports the most common reuse patterns. Third, we obtain responses from 374 developers about the code they have reused or originated. Most respondents write code with an explicit expectation that it will be reused. Developers reuse code for several reasons and are not concerned with bugs in the reused code, but they are willing to use package managers for reused code if such tools were provided. Overall, we find that despite its questionable reputation due to inherent risks, code copying is common, useful, and many developers keep it in mind when writing code.\n\nIn summary we ask the following research questions:\n\nRQ1 How much copy-based reuse occurs? What factors affect the propensity to reuse?\n (a) How extensive is copying in the entire OSS landscape?\n (b) Is copy-based reuse limited to a particular group of projects?\n (c) Do characteristics of the blob affect the probability of reuse?\n (d) Do characteristics of the originating project affect the probability of reuse?\n\nRQ2 How do developers perceive and engage with copy-based reuse?\n\nTo foster reproducibility, we have made the replication package for this study, including datasets creation scripts and analysis notebooks, publicly available at https://zenodo.org/records/14743941.\n\n2 BACKGROUND\n\nThis section is structured to provide a comprehensive understanding of the context and foundation for our research. It begins with an exploration of the types of reuse in software supply chains. Following this, we delve into the associated risks, discussing potential vulnerabilities, legal issues, and other challenges that can arise from software reuse. The third subsection introduces the social contagion theory (SCT) which helps select factors likely to affect the diffusion and adoption of reuse practices within the open source software development community.\n\n2.1 Reuse in Software Supply Chains\n\nA software supply chain comprises various components, libraries, tools, and processes used to develop, build, and publish software artifacts. It covers all stages from initial development to final deployment, including proprietary and open source code, configurations, binaries, plugins, container dependencies, and the infrastructure required to integrate these elements. The software supply chain ensures that the right components are delivered to the right places and at the right times to create functioning software products. Software reuse is one form of the software supply chain that enhances efficiency, reduces costs, and mitigates the risks associated with developing new software from scratch.\n\nIn the context of open source software, reuse in software supply chains can be categorized based on how the open source components are integrated and utilized within software projects [69\u201371].\n\n2.1.1 Dependency-based Reuse. Dependency-based reuse involves using open source libraries and packages as dependencies in a project. These dependencies are typically managed through package managers such as NPM for JavaScript, pip for Python, or Maven for Java. The reliance on these dependencies can introduce vulnerabilities and risks if not properly managed [98]. A web application using the React library, which in turn depends on numerous other libraries is an example of reuse in this kind of supply chain.\n2.1.2 Copy-based Reuse. Copy-based reuse is the type of reuse investigated in this work. In copy-based reuse, code from open source projects is copied directly into a project. For example, a developer might copy a utility function from an open source repository and integrate it into their own project. While this approach is quick, it can lead to challenges in maintaining and updating the copied code. It is essential to track and manage these copies to ensure they are secure and up-to-date [56].\n\n2.1.3 Knowledge-based Reuse. Knowledge-based reuse involves using knowledge and practices derived from open source projects without directly copying code or using dependencies. It includes the adoption of development methodologies, architectural patterns, and best practices from open source communities. For example, implementing a microservices architecture inspired by successful open source projects. While not explicitly detailed by many researchers, the concept of knowledge-based supply chains is inferred from broader discussions of open source influence on software development practices [100].\n\n2.2 Associated Risks\nWhile reuse can potentially reduce development costs, it is not always beneficial. It could introduce certain risks that might eventually escalate the overall costs of a project. These risks include, but are not limited to, security vulnerabilities, compliance, and the spread of bugs or low-quality code [31, 46].\n\n2.2.1 Security. The relationship between security and reuse can possess a dual-nature: a system can become more secure by leveraging mature dependencies, but it can also become more vulnerable by creating a larger attack surface through exploitable dependencies [35].\n\nIn the context of copy-based reuse, extensive code copying can lead to the widespread dissemination of potentially vulnerable code. These artifacts may reside not only in inactive projects (that are still publicly available for others to reuse and potentially spread the vulnerability further), but also in highly popular and active projects [78].\n\nUnderstanding the copy-based supply chain helps in identifying potential security risks and implementing appropriate safeguards [73]. Therefore, detecting reused code aids in identifying and consistently patching these vulnerabilities across all affected systems [56].\n\n2.2.2 Compliance. Many open source licenses come with specific requirements that must be met. Unintentional reuse of code that is subject to intellectual property (IP) rights or licensing restrictions can lead to legal complications. Understanding the supply chain and detecting reused artifacts ensures compliance with licensing agreements and protects against IP infringements [59, 100].\n\nAs software systems evolve, their licenses evolve as well. This evolution can be driven by various factors such as changes in the legal environment, commercial code being licensed as free and open source, or code that has been reused from other open source systems. The evolution of licensing can impact how a system or its parts can be subsequently reused [46]. Therefore, monitoring this evolution is important [19]. However, keeping track of the vast amount of data across the entire OSS landscape is a challenging task, and as a result, many developers fail to adhere to licensing requirements [2, 32].\n\nFor example, investigating a subset of codes reused in the Stack Overflow environment revealed an extensive number of potential license violations [2]. Even when all license requirements are known, the challenge of combining software components with different and possibly incompatible licenses to create a software application that complies with all licenses, while potentially having its own, persists and is of great importance [32]. When individual files are reused, licensing information may be lost, and the findings of our study might suggest approaches to identify and remediate such problems.\n2.2.3 Quality. Ensuring that all components of the supply chain meet quality standards is essential for the reliability and performance of the final product [9]. Copied code that has not been thoroughly vetted and tested can introduce bugs and defects. By identifying and evaluating such reused code, organizations can ensure that it meets their quality standards [69].\n\nCode reuse is not only assumed to escalate maintenance costs under specific conditions, but it is also seen as prone to defects. This is because inconsistent modifications to duplicated code can result in unpredictable behavior [48]. Additionally, failure to consistently modify identifiers (such as variables, functions, types, etc.) throughout the reused code can lead to errors that often bypass compile-time checks and transform into hidden bugs that are extremely challenging to detect [58].\n\nApart from the bugs introduced through code reuse, the source code itself could have inherent bugs or be of low quality. These issues can propagate similarly to how security vulnerabilities spread. The patterns of reuse identified in this study could potentially suggest strategies to leverage information gathered from multiple projects with reused code, thereby reducing such risks.\n\n2.3 Social Contagion Theory\n\nReusing code is an instance of technology adoption. One of the key questions we want to ask is what may affect the propensity of adopting (copying) a blob. Social Contagion Theory (SCT) [14] is a widely used theory for examining dynamic social networks and human behavior in the context of technology adoption [3, 84]. In the field of software engineering, it has been used to explain how developers select software packages [64].\n\nWe are using SCT to theorize about the dynamics of code reuse by conceptualizing it in terms of exposure, infectiousness, and susceptibility. SCT helps us frame our research questions by providing a structured way to analyze how code reuse spreads within the open source community. Specifically, we explore how developers become aware of reusable code, the inherent qualities of the code that make it more likely to be reused, and the characteristics of projects or developers that make them more likely to adopt reusable code. These dimensions guide the formation of our research questions, enabling us to systematically investigate the factors influencing reuse activity in open source software. The key value of SCT in our case is to help articulate factors affecting copy propensity via three dimensions:\n\n1. **Exposure.** Exposure is an intuitive notion that in order to copy an artifact, you first have to learn about and find it.\n\n2. **Infectiousness.** Infectiousness is the property of the artifact that affects its propensity to be reused.\n\n3. **Susceptibility.** Susceptibility is the property of the destination project or developer that reflects how much benefit they would (or believe they would) derive by reusing the artifact.\n\nFirst, for a blob (infectious agent) to be reused, a developer needs to become aware of it. In other words, it needs to be exposed to the open source community (population). Social coding platforms such as GitHub provide various crowd-sourced signals of project popularity. Developers may consider these characteristics of project popularity or health when choosing what resource to use [23, 61]. These considerations suggest that developers are more likely to be exposed to code in more popular or active projects. Therefore, we used project properties as a proxy for the likelihood of awareness. This primarily addresses RQ1-b and RQ1-d in our study.\n\nThe second concept of SCT, infectiousness, means that a highly virulent infectious agent is more likely to spread. In our context, this can be measured by the characteristics of the blob itself, corresponding to RQ1-c. Most of the literature on reuse has primarily focused on this aspect of the reused resource.\n\nThe final concept in our theory is susceptibility, which refers to the vulnerability of the target population to the infectious agent. In our case, this can be approximated by the characteristics of the target project (or author) that reuses the blob. For example, the use value, or how much the blob is needed in the project that copies it.\nThese characteristics are, by definition, highly specific to the target project, making them more challenging to measure. We aim to shed more light on this aspect in RQ2.\n\n3 RELATED WORK AND CONTRIBUTIONS\n\nWhile the benefits and risks associated with code reuse seem tangible, the extent and types of reuse across the entirety of OSS remain unclear. To prioritize these risks and benefits, and explore methods to minimize or maximize them respectively, we employ the approach introduced in our previous work [46]. This method allows us to track copy-based reuse on a scale commensurate with the vast size of OSS. The scope of copying activity is not fully encompassed by previous studies based on convenience samples, as we will illustrate in the results section.\n\nWe are not aware of any other curation system that operates at the level of a blob or finer granularity, nor is there an easy way to determine the extent of OSS-wide copy-based reuse at that level. Methods for identifying reuse, such as the one introduced by Kawamitsu et al. [50], are designed to find reuse between specific input projects and do not easily scale to detect reuse across all OSS repositories [46]. The methods we use to identify and characterize reuse could, therefore, serve as a foundation for tools that expose this difficult-to-obtain yet potentially important phenomenon [46]. We acknowledge that the actual extent of reuse is most likely much higher than what we find at blob-level granularity. Nevertheless, we believe the results we present will still be insightful, especially as the lower bound for the extent of copy-based reuse activity in the entirety of OSS.\n\nWe first differentiate copy-based reuse from related fields and then discuss our contributions.\n\n3.1 Related Research Areas\n\nTo comprehensively understand copy-based reuse, it is essential to discuss two closely related fields: the clone detection and the clone-and-own practice. Following discussion will focus on differentiating copy-based reuse from dependency-based reuse, clone detection, and clone-and-own practices, situating these within the broader context of code reuse literature.\n\n3.1.1 Code Reuse Analysis. Code Reuse Analysis encompasses techniques and practices that aim to maximize the efficiency and reliability of software development by leveraging existing code. Techniques such as static analysis, dependency analysis, and repository mining help identify reusable components within a codebase [52]. Through these methods, code reuse analysis seeks to reduce redundancy and enhance maintainability. Frakes and Kang [25] show that systematic code reuse can significantly reduce development time and costs while improving software quality.\n\n3.1.2 Clone Detection. Clone Detection is a technique within code reuse analysis for identifying similar or identical code fragments in a codebase. This process involves using tools to detect exact or slightly modified duplicates, which can then be refactored into reusable components. Techniques range from textual and token-based methods to more advanced semantic and abstract syntax tree (AST) analyses [80, 91]. These methods focus on identifying code clones within constrained contexts, often limited to small code snippets within a few projects [92]. Clone detection helps in managing redundancy and maintaining code quality by highlighting areas where code can be simplified and reused [80]. The effectiveness of clone detection tools has been validated in various studies, showing significant improvements in software maintainability [49].\n\n3.1.3 Clone and Own. Clone and Own is a practice where existing software components are copied and modified to meet new requirements. This approach is often utilized in product line engineering and situations where rapid development is important. Clone-and-own allows developers to quickly adapt existing solutions but can lead to maintenance challenges due to the proliferation of similar, independently maintained code fragments [54, 82].\npractice, common in open source development, involves significant modifications and independent maintenance, often leading to divergent development paths [7, 30].\n\nWhile clone detection focuses on technical identification of code snippets, the clone-and-own practice highlights the importance of customization and independent management of forked projects. As the clone-and-own practice involves both technical customization and significant social factors, such as community engagement and governance models, understanding these aspects is important for managing forked projects [7, 30]. Although clone-and-own supports the purpose of code reuse by facilitating quick adaptation, it often results in code duplication, complicating long-term maintenance. Research has shown that clone-and-own is prevalent in practice due to its simplicity and effectiveness in the short term [4].\n\n3.1.4 Copy-based Reuse. Copy-based reuse, a form of code reuse, involves copying existing code and potentially modifying it for use in new contexts. This method allows for rapid development but shares the maintenance challenges associated with clone-and-own, as duplicated code must be managed across different parts of the software. In summary, code reuse analysis encompasses techniques like clone detection to manage redundancy and practices like clone-and-own to adapt existing code for new purposes. While clone detection and code reuse analysis share the goal of improving code quality and maintainability by identifying and managing redundancy, clone-and-own focuses on rapid adaptation rather than efficient redundancy management, despite serving a similar purpose in promoting reuse. Both copy-based reuse and clone detection address code duplication but differ significantly in their methodologies and scopes. Copy-based reuse research, as exemplified by our work, provides a broader, ecosystem-level perspective, incorporating social aspects and the characteristics of entire projects. In contrast, clone detection focuses on the technical identification of code snippets within specific contexts, while the clone-and-own practice emphasizes customization and independent maintenance of forked projects.\n\n3.2 Contributions\n\nOur contribution in this work has three aspects as follows.\n\n3.2.1 Accuracy. Our study leverages the World of Code (WoC) infrastructure to analyze reuse of nearly the entire open source software landscape. This allows the capture of the instances of copying that would be missed if only a subset of public repositories were to be analyzed. In contrast, previous studies often focused on samples of mostly \u201cpopular\u201d repositories drawn from specific communities or subsets of programming languages. They either have mostly concentrated on a specific community (e.g. Java language, Android apps, etc.) [21, 39, 40, 43, 68, 86] or only sampled from a single hosting platform (e.g. GitHub) [33, 34]. This, consequently, prevented identification of all inter-community or out-of-sample copies.\n\nEven research with more comprehensive programming language coverage such as study by Lopes et al. [60] or studies by Hata et al. [41, 42] analyze only a subset of programming languages and additionally use convenience sampling methods by excluding less active or \u201cunimportant\u201d repositories. As our results demonstrate, even inactive and \u201csmall\u201d projects appear to provide many of the artifacts reused in OSS, even by the \u201clargest\u201d and most active projects.\n\nExisting literature on code cloning primarily focuses on empirical studies, case studies, and tool evaluations. Empirical studies typically analyze code clones within specific projects or samples of open source software repositories. These datasets are large but not exhaustive of the entire OSS ecosystem. For example, studies by Juergens et al. [48], Roy et al. [81] examine hundreds to thousands of files or repositories, providing valuable but partial insights. Case studies offer in-depth analysis of cloning practices within individual projects or organizations, giving detailed context but limiting the scale to the specific cases under study. Tool evaluations involve benchmark studies of clone detection tools, evaluating their performance on curated datasets. While these studies contribute important information about tool effectiveness, they do not cover the entire OSS ecosystem.\nUnlike studies that rely on selective sampling, our analysis encompasses nearly the entire open source software ecosystem, providing a broad and necessary foundation for understanding code reuse. This is a fundamental requirement for accurately tracking the origin of files within entire OSS, as it helps to uncover accurate trends and patterns that would be biased in analyses based on the samples of such data, offering a more accurate understanding of reuse practices.\n\n3.2.2 Methodology and Focus. Copy-based reuse has not been explored as thoroughly as the dependency-based reuse (e.g., [15, 26, 74]). For example, Mili et al. [66] have shown that dependency-based reuse can lead to more sustainable software architectures by promoting component-based design and reducing redundancy. Additionally, Brown and Wallnau [11] demonstrated that by leveraging well-defined interfaces and reusable libraries, dependency-based reuse can significantly improve software maintainability and scalability. Nevertheless, very few, if any, similar analyses exist regarding copy-based reuse. Copy-based reuse is potentially no less important, but is a much less understood form of reuse [46]. Most studies in copy-based reuse domain focus on clone detection tools and techniques [1, 40, 47, 81, 97] rather than on the characteristics of entire source code files that possibly make reuse more or less likely.\n\nFurthermore, almost all studies we reviewed focus solely on source code reuse, whereas we track all artifacts, whether they are code or other reusable development resources [46]. By using the World of Code research infrastructure, which encompasses nearly the entire OSS ecosystem, we identified and analyzed copying activity at this scale for the very first time.\n\nIn contrast to clone detection, which primarily involves identifying similar code snippets within specific directories or domains [45, 90], our research addresses the broader context of entire files and diverse artifacts across the OSS ecosystem, providing a more comprehensive understanding of reuse. Our method bridges the clone detection and clone-and-own approaches by detecting all instances of reuse, whether they are kept without any changes or modified after reuse, thereby encompassing both the technical and managerial aspects of code reuse.\n\nIn existing clone detection literature, several methods are employed to identify code clones. These methods include text-based, token-based, tree-based, and graph-based techniques. Text-based methods detect clones by comparing raw text, which is straightforward but can be less accurate due to variations in formatting. Token-based methods improve on this by converting code into tokens and detecting similarities at this more abstract level, enhancing accuracy but still being susceptible to variations in code structure. Tree-based methods parse the code into abstract syntax trees (ASTs) and identify clones by comparing these trees, providing a more structured and semantically meaningful detection. Graph-based methods further abstract code into control flow or data flow graphs, allowing for the detection of more complex and semantic clones [81].\n\nThe clone and own literature primarily employs these detection methods to understand the broader landscape of code cloning. For example, Juergens et al. [48] utilized a combination of these techniques to analyze cloning practices in software projects. These methods are effective in identifying different types of clones, such as exact, parameterized, and semantic clones, but they often focus on similarities and patterns rather than exact matches.\n\nIn contrast, our research employs a method focused on identifying reuse at the blob-level, specifically detecting if the exact versions of code have been copied. While it misses instances where a single code snippet has been copied, this approach does not rely on abstractions or patterns. This method involves obtaining hashes for all versions of the entire open source software ecosystem to detect identical code segments, ensuring that every version of code is tracked to its origin. This exhaustive and detailed approach allows for a comprehensive analysis of copy-based supply chains at the OSS level. Since software supply chains form a network over the entire OSS, it is not feasible to study them by sampling projects: representative samples from large graphs are notoriously difficult to obtain (see, e.g., [57]).\nIn addition to ensuring that the entire file has been copied and committed, our method easily scales to the entire OSS ecosystem as it avoids the need to look for similarities among tens of billions of versions by utilizing hashes. Traditional clone detection techniques would need to be substantially modified to work at this scale. We discuss some of the potential approaches in Section 8.1.\n\n3.2.3 Influencing Factors and Social Aspects. Our study explores how the characteristics of OSS projects influence the propensity for their artifacts to be reused, examining their social aspects. Previously, the focus has been primarily on the desired functionality and the code itself [29, 87], but we also investigate the social aspects of this phenomenon in the open source community.\n\nThe literature on clone detection and our research both explore the social aspects of code reuse, but they do so from different perspectives and with varying emphases on social and technical factors. Existing literature on clone detection primarily focuses on the technical aspects of identifying code clones and understanding their impact on software maintenance and quality. For instance, studies by Juergens et al. [48], Roy and Cordy [80] delve into the reasons for code cloning, such as improving productivity, learning, and avoiding reimplementation of similar functionalities. These studies often highlight the technical motivations behind code cloning, such as reusability and rapid prototyping, but they also touch upon social aspects like collaborative development and knowledge sharing within teams. However, the primary emphasis remains on the technical detection and management of code clones.\n\nIn contrast, our research takes a broader view by examining how the characteristics of open source software projects influence the propensity for their artifacts to be reused. This includes a detailed analysis of both social and technical factors. Our study explores the diverse motivations and implications of reuse in the OSS community, considering aspects such as project size, community engagement, and the collaborative nature of OSS development. By doing so, we highlight the importance of social dynamics in code reuse, including factors like community contributions, the reputation of projects, and the collaborative environment that fosters code sharing and reuse.\n\nBy examining these social and technical factors, our study provides a more comprehensive understanding of the motivations behind code reuse in the OSS community. We draw parallels to other factors influencing copy-based reuse, such as the ease of access to code, the open and collaborative nature of OSS projects, and the role of community support and documentation. This broader perspective allows us to highlight the diverse and sometimes conflicting motivations for code reuse, ranging from technical efficiency to social recognition and collaborative learning.\n\n4 METHODOLOGY\n\nWe begin by briefly describing the World of Code infrastructure utilized in our study, followed by presenting the methods introduced in our previous work [46] to identify instances of copying. Next, we explain the time complexity of our method and discuss the rationale behind our choice. In the second and third subsections, we discuss methods used to answer each research question in more detail.\n\nTo make the subsequent discussion precise, we first introduce a few definitions. The time when each unique blob \\( b \\) was first committed to each project \\( P \\) is denoted as \\( t_b(P) \\). The first repository \\( P_o(b) = \\text{ArgMin}_P t_b(P) \\) is referred to as the originating repository for \\( b \\) (and the first author as the creator). Then project pairs consisting of a project with the originating commit and the destination project with one of the subsequent commits producing the same blob \\( (P_o(b), P_d(b)) \\) are identified as reuse instances. The reuse propensity (the likelihood that a blob will be copied to at least one other project) is then modeled based on the type of the file represented by the blob and the activity and popularity characteristics of the originating projects.\n4.1 Identification of Reused Blobs\n\n4.1.1 World of Code Infrastructure. Finding duplicate pieces of code and tracking all revisions of that code across all open source projects is a data- and computation-intensive task due to the vast number of OSS projects hosted on numerous platforms [46]. Previous studies on reuse have consequently often focused on a relatively small subset of open source software, potentially missing the full extent of reuse that could only be obtained with a nearly complete collection [46]. World of Code (WoC) [62, 63] infrastructure aims to address these challenges by regularly discovering, retrieving, indexing, and cross-referencing information from new and updated version control repositories that are publicly available.\n\nWoC operationalizes copy-based reuse by mapping blobs, which are versions of the source code, to all commits and projects where they have been created. This means that copy-based reuse is detected only if an entire file is duplicated without any alterations [46]. If the reuser commits the reused blob before making any modifications, this method will find it; however, if they commit only after making alterations to the original file, it will not be identified. Given this, our study focuses solely on whole-file copying activity. Consequently, different versions of what was originally the same file will be treated as distinct entities since they are different blobs.\n\n4.1.2 Project Deforking. To understand reuse across the entirety of open source software, it is important to identify distinct software projects. Git commits are based on a Merkle Tree structure, uniquely identifying modified blobs, and therefore, shared commits between repositories typically indicate forked repositories. As a distributed version control system (VCS), Git facilitates cloning (via git clone or the GitHub fork button), resulting in numerous repositories that serve as distributed copies of the same project. While this feature enables distributed collaboration, it also leads to many clones of the original repository [72].\n\nTo differentiate copy-based reuse from forking, we use project deforking map $p2P$ provided in WoC [72]. Using community detection algorithms, this map provides a clearer picture of distinct projects by linking forked repositories $p$ to a single deforked project $P$ based on shared commits.\n\nAn advantage of this map over using the fork data from platforms like GitHub is that WoC\u2019s p2P map is based on shared commits, providing higher recall by not missing forks that did not occur through GitHub\u2019s forking option but rather through cloning the repository. Additionally, forks and clones hosted on different platforms cannot be traced easily, but the WoC map is platform-independent and does not have this constraint. Moreover, some forks may diverge significantly from the original repository but are still considered forks by hosting platforms. WoC\u2019s deforking algorithms use community detection via shared commits. If forks diverge substantially via maintenance after forking, the community detection algorithm would recognize them as distinct projects, which reduces false positives and increases precision.\n\nWhenever we mention \u201cproject\u201d in our paper, we are actually referring to a \u201cdeforked project\u201d as defined here. This ensures that our discussions about reuse are based on unique instances of software development projects rather than duplicated efforts through forks.\n\n4.1.3 Dataset Creation. To understand the identification of reused blobs, it is important to explain how the dataset we used [46] was created. Despite the key relationships WoC offers, several obstacles had to be resolved. The initial step was to pinpoint the first instance, denoted as $t_b(P)$, when each of the approximately 16 billion blobs appeared in each of the almost 108 million projects. To this goal, first the c2fbb map (which is the result of diff on a commit: commit file, blob, old blob and lists all blobs created by each commit) was joined with the c2dat map (full commit data) to obtain the date and time of each commit. The result was then joined with the c2P map (commit to project) to identify all projects containing that commit.\n\n---\n\n2See https://github.com/woc-hack/tutorial for more information about WoC map naming convention\nThe result is a new c2btP map (commit to blob, time, and Project). To create the timeline for each blob, all that data was sorted by blob, time, and project resulting in b2tP map \\((b, t, P)\\) where we have only blob, time, and the deforked project that contain our desired timeline \\(C_1\\).\n\nFinally, the blob timelines\\(^3\\) were used to identify instances of reuse \\((C_1(P_1), C_1(P_2))\\) or Ptb2Pt map, where the first project is the originating project\\(^4\\) and the second project is the destination project of the reused blob, meaning the blob was created at a later time in this project. This resulting Ptb2Pt map contains all instances of blob reuse. The data flow of reuse identification is shown in Figure 1.\n\n### 4.1.4 Time Complexity Analysis.\n\nTo evaluate the complexity and time requirements of our methodology for identifying reuse, we analyze the time complexity of each step and provide a benchmark for execution time on a typical computer setup. The overall time complexity is dominated by the sorting operations involved in processing the large maps. Data preparation and joining involve merging the precalculated maps in WoC, namely the c2fbb, c2P, and c2dat maps. Since these maps are already sorted and split into 128 partitions, we can join them with a complexity of \\(128 \\times O(l + m + n)\\), where \\(l, m,\\) and \\(n\\) are the number of rows in the maps respectively.\n\nWe then drop the commit hashes and sort the joined b2tP map based on blob, time, and project, which is the most computationally intensive step, with a complexity of \\(O(n \\log n)\\), where \\(n\\) is the total number of rows in the b2tP map. Identifying reuse instances, given that the data is already sorted by blob, has a complexity of \\(O(n)\\), where \\(n\\) is the total number of copy instances.\n\nUsing a high-performance workstation as a benchmark (8-core processor at 3.5 GHz, 128 GB RAM, 2 TB SSD), we calculate the execution time for each step. Data preparation and joining, with a linear-time merge, primarily involve reading and writing large files. With a sequential read/write speed of approximately 500 MB/s for SSDs, joining the maps (total size around 128 billion rows) is expected to take roughly 1-2 hours. Sorting the created b2tP map, which requires external sorting of about 74 billion rows, necessitates multiple passes over the data. Based on empirical data, a modern external sorting algorithm with 8 cores can handle around 0.5 billion rows per hour. Hence, sorting this map would take approximately 148 hours. Identifying reuse instances, involving efficient I/O operations, is estimated to take 4-6 hours. In total, the entire process is estimated to take approximately 153-156 hours, or about 6.5 days.\n\nDetecting code reuse in finer granularity than blob-level, such as through syntax tree parsing or text similarity techniques, would offer a more comprehensive view of code reuse. However, these methods involve several computational challenges and resource constraints, making them impractical for our study.\n\nParsing the abstract syntax tree (AST) for each file to detect structural similarities involves several computational steps. First, each file must be parsed into its AST representation, which itself is an \\(O(n)\\) operation where \\(n\\) is the total number of unique blobs. For our dataset of 16 billion blobs, this parsing step alone would be extremely resource-intensive. Following parsing, comparing each AST to identify potential reuse instances would require pairwise comparisons. The pairwise comparison complexity is \\(O(n^2)\\), resulting in an infeasible \\(O((16 \\times 10^9)^2)\\) complexity.\n\nText similarity measures on the other hand, such as Levenshtein distance or cosine similarity, involve comparing each blob\u2019s contents with every other blob. These methods typically operate with a complexity of \\(O(n^2)\\) for each pair of files, again resulting in an infeasible \\(O((16 \\times 10^9)^2)\\) complexity. Even with optimizations like locality-sensitive hashing or other approximation techniques, the scale of the data renders this approach impractical.\n\nGiven the significant computational complexity and resource requirements, detecting code reuse at a finer granularity than blob-level is not feasible for our study. Instead, we have chosen to focus on blob-level reuse detection, which provides a practical and scalable solution. While this approach is limited to detecting exact file\n\n---\n\n\\(^3\\)All but the first commit time creating the blob for each project were dropped as a blob is often reused within a repository.\n\n\\(^4\\)See section 7 for the limitations in identifying the originating project.\nFig. 1. Reuse Identification Data Flow Diagram\ncopies, it ensures that the analysis remains within the bounds of available computational resources and time constraints, thereby enabling a thorough and efficient examination of code reuse in the OSS landscape.\n\n4.2 RQ1: How much copy-based reuse occurs? What factors affect the propensity to reuse?\n\n4.2.1 RQ1-a: How extensive is copying in the entire OSS landscape? To investigate how widespread whole-file copying in OSS actually is, we first want to establish a baseline: what fraction of blobs were ever reused, and if reused, to how many downstream projects? Specifically, in RQ1-a, we are showing the number of blobs, originating as well as destination projects (deforked), and copy instances across the entire OSS ecosystem. These numbers are not estimates but the actual numbers calculated over the complete dataset.\n\n4.2.2 RQ1-b: Is copy-based reuse limited to a particular group of projects? One may argue that the results in RQ1-a are not necessarily important, as only \u201csmall\u201d projects may reuse code in a copy-based manner. To see if this is actually the case, we randomly sampled 5 million reuse instances from each of the 128 files into which the data was divided, based on the first two bytes of the hash of blobs. This resulted in a total of 640 million instances for the analysis. This approach ensured that our sample was distributed across the entire dataset, capturing a diverse range of copy instances. The sample size of 640 million instances constitutes approximately 2.67% of the entire dataset. Although this is a small fraction of the data, it is sufficiently large to ensure the statistical reliability and representativeness of our analysis, as the large absolute size of the sample guarantees its statistical reliability according to the Central Limit Theorem.\n\nBefore going further, we need to define the qualitative and, more importantly, subjective terms of \u201csmall\u201d and \u201cbig\u201d projects with quantitative and justified measures. Crowston and Howison [17] and Koch and Schneider [51] have shown that project activity, as measured by commit frequency, is a strong indicator of project health and sustainability. Additionally, the use of stars as a metric is well-supported in the literature, as they represent a form of user endorsement and are correlated with project visibility and perceived quality [77]. We choose these two metrics because both the number of commits and the number of stars are indicators of a project\u2019s activity and popularity. Commits reflect the ongoing development and maintenance efforts, which are important for the sustainability and evolution of a project. Stars, on the other hand, reflect the community\u2019s interest and endorsement, indicating the project\u2019s visibility and influence. These metrics are widely used in empirical software engineering research to evaluate the health and impact of open source projects [8, 47].\n\nWe define projects with over 100 commits and 10 stars as \u201cbig\u201d projects. The mean and 3rd quantile values for the number of commits in our dataset are 46 and 12, respectively. This aligns with established practices in the literature where thresholds are often set significantly above average to isolate highly active projects. By setting the threshold at more than double the mean, we ensure that only the top-performing projects are classified as big. Similarly, the threshold of 10 stars is set based on the mean of 2.33 and 3rd quantile value of 0 for stars. This indicates that the majority of projects receive few or no stars, reflecting their popularity and community engagement levels. By selecting projects with at least 10 stars, we focus on those with significant community recognition, capturing less than 1% of the dataset but representing the most influential projects.\n\nThe thresholds chosen for \u201csmall\u201d group, on the other hand, are projects with no stars and fewer than 10 commits to ensure the projects are indeed small and inactive. This approach ensures that the small group, comprising 62% of projects, includes those with minimal activity and engagement, consistent with findings by Gousios and Spinellis [37] that a large proportion of open source projects are relatively inactive. We consider all the other projects that do not fall into either the big or small categories as the \u201cmedium\u201d group. The medium group captures the middle ground, excluding only the extremes, thus providing a balanced representation of the majority of active projects.\n\nUsing this taxonomy, we counted the number of unique blobs involved in these copy instances between groups. It should be mentioned that a blob can have several downstream projects that do not necessarily fall into the\nsame group. Therefore, we considered the biggest downstream project for our analysis purposes. For example, if a blob originated in a medium project and was reused by both a big and a small project, we count it in the \u201cmedium to big\u201d category. Considering the biggest downstream project for each unique blob ensures that the most significant reuse instances are captured. This approach is supported by research indicating that the impact of code reuse is often determined by the size and activity of the downstream projects utilizing the code [68, 95]. By focusing on the largest downstream project, we ensure that our analysis reflects the most substantial and influential reuse cases of a particular blob.\n\n4.2.3 RQ1-c: Do characteristics of the blob affect the probability of reuse? The third part of our research question (RQ1) focuses on the properties of reused artifacts. To address this, we obtained a large random sample of blobs comprising 1/128 of all blobs. We have to point out that unlike RQ1-b, where we randomly sampled copy instances (meaning all the blobs involved were reused at least once), here we are sampling from the b2tP map that includes all blobs, whether they have been reused or not. Our dataset is divided into 128 files based on the first two bytes of the blob hash. Hash functions, by design, distribute input data evenly across the output space. The use of hash functions to divide data ensures a uniform distribution across the resultant files [67]. By using one of these 128 files as our sample, and given the vast size of the dataset, we ensure that it is an unbiased representation of the entire dataset and that this sample size is sufficient to achieve high statistical power and accuracy in our analyses.\n\nWe then employed a logistic regression model with the response variable being one for reused blobs and zero for non-reused blobs. Logistic regression is a robust statistical method used to model the probability of a binary outcome based on one or more predictor variables. It is widely used in empirical software engineering to understand factors influencing software development practices [44]. By using logistic regression, we can quantify the effect of various predictors on the likelihood of a blob being reused.\n\nIn this research question, we are concerned with infectiousness based on our Social Contagion Theory. Specifically, we are looking for properties of artifacts that affect their propensity to be reused. The first predictor in our model is the programming language of the blob. Different programming languages are associated with distinct package managers, development environments, and community cultures, which can influence reuse practices [6]. For example, the ease of dependency management in languages like Python (via pip) or JavaScript (via NPM) might facilitate reuse more than in languages with less mature package management systems. Thus, including the programming language as a predictor helps capture these contextual differences. We anticipate that source code for programming languages such as C, which lack package managers, is likely to be copied more frequently than source code for languages with sophisticated package managers, such as JavaScript.\n\nThe second predictor is the time of blob creation. This factor helps account for temporal dynamics by indicating the period during which a blob was created, reflecting different reuse practices over time. We hypothesize that older blobs were more likely to be reused due to fewer available reusable artifacts in the OSS landscape at the time. However, the time of creation inherently includes the effect of a blob\u2019s availability duration \\((t_b(P_d) - t_b(P_o))\\), meaning older blobs have had more time to be discovered and reused. Previous research by Weiss and Lai [95] indicates that the age and visibility of code artifacts influence their reuse.\n\nTo isolate and examine the influence of the creation period without the confounding effect of longer availability, we introduce the concept of time-limited reuse. By focusing on copies occurring within specific time intervals after the blob\u2019s creation, we remove the advantage of longer visibility and can better assess how the creation period itself influences reuse\\(^5\\). We evaluated both one-year and two-year intervals and found similar results. By evaluating both intervals and finding similar results, we enhance the robustness of our conclusions. To maintain conciseness and avoid repetition, we report the findings for the two-year interval. Reporting the two-year interval results provides a balance between sufficient observation time for reuse events and the practical need for concise reporting. Consequently, we excluded blobs created after May 1, 2020, ensuring that all blobs had at least two\n\n\\(^5\\)This definition is used solely for the purposes of our regression model and subsequent analysis. It is not applied in RQ1-a, RQ1-b, or RQ2.\nyears to be potentially reused, providing a consistent time frame for analysis [96]. This approach ensures that our findings are not skewed by varying availability periods.\n\nThe third predictor is whether the blob is a source code or a binary. We hypothesize that binaries, identified by their git treatment or file extensions like tar, jpeg, or zip, may exhibit different reuse patterns compared to source code. We expect that binary files, such as images, might be copied more often because they are easy to understand and reuse but difficult to recreate. Unlike other types of files, developers cannot easily extract specific parts or functionalities from binary files. That is, source code blobs are directly reusable and modifiable, whereas binaries might be reused as-is without modification. This distinction is important as it affects the ease or necessity of reuse [27]. Therefore, when it comes to whole-file reuse, which is our definition of reuse in this work, we anticipated that binary blobs are more likely to be copied.\n\nThe last factor we hypothesize might affect the propensity of a blob to be reused is its size. The size of a blob can influence its reuse for several reasons. Larger blobs may contain more functionality, making them more attractive for reuse. Conversely, smaller blobs may be simpler to integrate into existing projects. Previous research by Capiluppi et al. [12] and Mockus [68] has indicated that the size of code artifacts can impact their maintainability, comprehensibility, and ultimately their reuse.\n\nTo investigate whether a difference exists between the sizes of copied and non-copied blobs, we exclude binary blobs from the analysis. The size of binary blobs is not comparable to the size of source code blobs due to their fundamentally different nature. Binary blobs often include compiled code, media files, or compressed archives, which do not provide a meaningful comparison to plain text source code in terms of size. Because of these differences, we did not incorporate blob size as a predictor in our logistic regression model. Including binary blobs could skew the results and lead to misleading conclusions. Instead, we perform a t-test to compare the sizes of copied blobs and non-copied blobs. The t-test is a robust statistical method used to determine whether there is a significant difference between the means of two groups [88]. By applying the t-test, we can rigorously assess whether blob size influences the likelihood of reuse.\n\n4.2.4 RQ1-d: Do characteristics of the originating project affect the probability of reuse? The fourth part of RQ1 concerns the chances of finding or being aware of a blob approximated by signals at the project level. This is the exposure factor in the Social Contagion Theory. To conduct this study, we use WoC\u2019s MongoDB project database to randomly sample one million projects, comprising nearly 1% of all projects indexed by WoC, to achieve a balance between statistical validity and computational feasibility. A sample size of one million is large enough to provide a representative snapshot of the entire population.\n\nWe then search the reuse instances \\( (C_1(\\% > 3), C_1(\\% < 3)) \\) in our Ptb2Pt map to determine if the project originated at least one reused blob. A logistic regression model with the response variable being one if the project has introduced at least one reused blob (and zero otherwise) is then constructed. The predictors in the project-level model include the number of commits, blobs, authors, forks, earliest commit time, the activity duration of the project (the time between the first and the last commit in that project), the binary ratio (the ratio of binary blobs to total blobs), and the programming language. We also use the number of GitHub stars for each project as a predictor. This data in WoC (number of stars) is sourced from GHTorrent [36].\n\nThe choice of these predictors for our model is based on the current literature on relevant project properties.\n\n- **Number of Commits.** Number of commits is a strong indicator of project activity and maintenance. Koch and Schneider [51] show that projects with higher commit frequencies tend to have more active development and are more likely to be reused due to their perceived reliability and continuous improvement.\n\n- **Number of Blobs.** Number of blobs represents the volume of content and potential reusable components. Larger projects with more blobs are likely to offer more opportunities for reuse [68]. It can also indicate the project\u2019s complexity and modularity. Projects with more files may be more modular and provide more reusable components.\n\u2022 **Number of Authors.** Number of authors reflects the collaborative nature of a project. Projects with more contributors tend to have diverse expertise, which supports innovation and decentralized communication, improving the development process [17], and potentially increasing the likelihood of reuse.\n\n\u2022 **Number of Forks.** Number of forks is a proxy for the project\u2019s popularity and community engagement. Projects with more forks are often viewed as valuable and trustworthy [93], increasing their reuse potential.\n\n\u2022 **Earliest Commit Time and the Activity Duration.** Earliest commit time and the activity duration provide insights into the project\u2019s maturity and stability. Older and long-active projects are more likely to be well-established and reused [28].\n\n\u2022 **GitHub Stars.** GitHub stars is a form of social endorsement, indicating community approval and interest. Projects with more stars are likely to be considered high-quality and reliable, making them more attractive for reuse [8].\n\n\u2022 **Binary Ratio.** Binary ratio, defined as the ratio of binary blobs to total blobs, can impact the reuse potential of a project. Binary blobs, such as compiled code or media files, often indicate pre-packaged functionalities or resources that are ready for use. A higher binary ratio may suggest that a project provides ready-to-use components, which can facilitate reuse [68].\n\nRegarding language assignment, at the blob-level, WoC\u2019s b2sl map was used for blob language detection based on file extensions. This method is straightforward and effective for identifying the programming languages of individual blobs. Nevertheless, assigning a primary language to a project is more complex due to the use of multiple languages in most projects. WoC\u2019s MongoDB project database provides counts of files with each language extension, allowing us to pick the most frequent extension as the project\u2019s main language. For our study, we considered only a subset of blobs, specifically originating blobs (blobs first seen in OSS within the project), and assumed the most common language among these blobs as the project\u2019s primary language. This approach aligns with the practice of determining the dominant language based on primary contributions [94].\n\n4.3 **RQ2: How do developers perceive and engage with copy-based reuse?**\n\nThe second research question in our study aims to triangulate the quantitative results and understand how developers perceive and engage with copy-based reuse. While quantitative research often focuses on metrics such as frequency, intensity, or duration of behavior, qualitative methods are better suited to explore the beliefs, values, and motives underlying these behaviors [13].\n\nUsing a questionnaire for triangulation allows us to obtain self-reported data, which can confirm or challenge the quantitative findings. This method helps identify any discrepancies and provides a deeper understanding of participant behavior [18]. In our study, the questionnaire included a direct question (\u201cDid you create or copy this file?\u201d) to gather self-reported data on whether participants copied the blob, offering a direct measure to compare against the quantitative results.\n\nAdditionally, based on the Social Contagion Theory (SCT), we hypothesize that the characteristics of the destination project and/or author influence reuse activity. However, treating all reusers the same could be problematic, as developers may have fundamentally different reasons for reuse. Motivations for reuse can vary widely based on individual needs, project requirements, and perceived benefits from the reused code [24, 68]. Our primary focus was to understand these motivations to categorize different types of reuse, potentially providing more insight into measuring susceptibility for future research. By categorizing motivations, we aim to identify distinct patterns and factors influencing reuse behavior, facilitating the development of targeted strategies to enhance code reuse practices. This approach aligns with qualitative research methods that seek to explore complex phenomena through detailed, contextualized analysis [16].\nTo gain insights into the motivations behind copy-based reuse, we conducted an online survey targeting both the authors of commits introducing reused blobs and the authors of commits in the originating repositories. The survey aimed to capture a range of experiences and perceptions related to copy-based reuse.\n\n4.3.1 Survey Content and Questions. The survey included questions about the nature of the file, why it was needed, how it was chosen, and whether developers would use tools to manage reused files. General questions about the repositories and developers\u2019 expertise were also included. Notably, the question about the reason for needing the file was open-ended to capture unbiased and detailed responses about the motivations for reuse. All the questions were optional, except for the very first one, which asked if the respondent had created or reused the file. We chose not to directly ask why did developers choose to copy to avoid provoking legal and ethical concerns about copy-based reuse. For this reason, instead, we asked: \u201cWhy was this file needed? How did it help your project?\u201d\n\nFurthermore, we asked developers if the project in which the file resides was intended to be used by other people. Understanding whether creators intend for their resources to be reused helps assess the cultural and strategic aspects of OSS development. If a significant portion of creators design their code with reuse in mind, it indicates a collaborative ecosystem where resources are shared and built upon.\n\nWe also asked a series of Likert scale (on a scale from 1 to 5) questions as follows.\n\n- \u201cTo what extent did this file help you?\u201d - Gauging how helpful creators and reusers find the reused blobs provides quantitative data on the perceived value of the reused code. Comparing the ratings between creators and reusers highlights any discrepancies or alignment in perceived usefulness.\n- \u201cTo what extent were you concerned about potential bugs in this file?\u201d - Investigating reusers\u2019 concerns about bugs in reused code sheds light on the perceived risks associated with this practice. Understanding the level of concern can indicate how much trust reusers place in the original code\u2019s quality.\n- \u201cHow important is it for you to know if the original file has been changed?\u201d - Understanding reusers\u2019 concerns about changes in the original files helps identify potential issues related to the stability and continuity of reused code. Frequent changes can disrupt the functionality of dependent projects.\n- \u201cHow likely would you use a package manager which could handle changes to this file if there was one?\u201d - Understanding the likelihood of reusers adopting a package manager if available provides insights into the demand for tools that can streamline and manage code reuse.\n\n4.3.2 Sampling Strategy. To ensure a representative and comprehensive sample, we stratified the data along several dimensions. Stratified sampling ensures that all relevant subgroups are adequately represented in the survey, enhancing the generalizability of the findings [16]. By considering multiple dimensions such as productivity, popularity, copying patterns, file types, and temporal aspects, we ensure a comprehensive analysis that captures the diversity of reuse behaviors in the OSS community:\n\n- **Productivity and Popularity**: Based on the number of commits and stars, we differentiated between high and low productivity/popularity projects (similar to RQ1-b).\n- **Copying Patterns**: We distinguished between instances where only a few files were copied versus multiple files, as these might indicate different reuse behaviors.\n- **File Extension**: We included various file types and programming languages to capture a diverse range of reuse scenarios.\n\n---\n\n6The survey and its procedure was approved by our institutional review board, ensuring that it adhered to ethical guidelines for research involving human subjects.\n\n7See online appendix for survey questions.\n\u2022 **Temporal Dimensions**: We considered the blob creation time and the delay from creation to reuse to understand temporal patterns in reuse behavior.\n\n4.3.3 **Survey Design.** For each copy instance, we targeted the author of the commit introducing the blob into the destination repository and the author of the commit in the originating repository. This dual perspective allowed us to capture both the originator\u2019s and the reuser\u2019s viewpoints, offering a more comprehensive understanding of the reuse dynamics.\n\nWe conducted three rounds of surveys, progressively expanding the sample size and refining the questions based on feedback and preliminary results. We chose to conduct our survey in three steps to ensure a thorough and iterative approach to understanding developer motivations behind copy-based reuse.\n\n1. We handpicked 24 developers (12 creators and 12 reusers) for an initial survey with open-ended questions. This round aimed to gather in-depth qualitative data and identify key themes. This small, purposive sample size allows for deep, exploratory insights, which are important for the initial stages of qualitative research [38].\n\n2. The survey was sent to 724 subjects (329 creators and 395 reusers) with a mix of open-ended and multiple-choice questions. This round helped validate and refine the themes identified in the first round. The increased sample size in this round provides more data to ensure that the themes and patterns observed are not idiosyncratic but rather indicative of broader trends. This intermediate sample size balances the need for more extensive data while still allowing for qualitative depth [65].\n\n3. The survey was expanded to 8734 subjects (2803 creators and 5931 reusers), with most questions being multiple-choice to facilitate quantitative analysis, except for the open-ended question about the reason for needing the file. The large sample size in this final round ensures that the findings are statistically significant and generalizable across the broader population of developers involved in copy-based reuse. This sample size aligns with recommendations for achieving sufficient statistical power in survey research [53].\n\nThe reason behind the seemingly random numbers of survey subjects in the three rounds is that after sampling our data, we had to perform data cleansing and preparation to reach the survey target audience. This process normally caused some samples to be removed. Initially, we chose sample sizes of 30, 1,000, and 10,000 respondents for the three rounds respectively, but after the data cleansing process, the actual numbers were lower.\n\n4.3.4 **Thematic Analysis.** The thematic analysis allows us to systematically identify patterns and themes within qualitative data, providing deep insights into the reasons behind copy-based reuse [10]. To analyze the survey responses, we followed a structured thematic analysis process as outlined by Yin [99]:\n\n1. **Compiling**: First author compiled all responses.\n\n2. **Disassembling**: Each author individually analyzed and coded the responses to identify ideas, concepts, similarities, and differences [5, 89].\n\n3. **Reassembling**: The coded responses were organized into meaningful themes by each author independently, focusing on identifying different types of reuse [10].\n\n4. **Interpreting and Concluding**: The authors discussed and compared the themes, clarifying and organizing them to ensure a coherent and comprehensive understanding. The final themes were then used to reclassify and interpret all survey responses.\n\n5 **RESULTS & DISCUSSIONS**\n\nThe numbers presented in this section are derived from version U of WoC, which was the most recent version available at the time of this analysis.\n\n---\n\n8 Only if they had explicitly disclosed their email address on their public profile.\n\n9 https://bitbucket.com/swsc/overview\n5.1 RQ1: How much copy-based reuse occurs? What factors affect the propensity to reuse?\n\n5.1.1 RQ1-a: How extensive is copying in the entire OSS landscape? We identified nearly 24 billion copy instances (unique tuples containing the blob and originating and destination projects) encompassing more than 1 billion distinct blobs. With approximately 16 billion blobs in the entire OSS landscape (as approximated by WoC), 6.9% of the blobs have been reused at least once, and each reused blob is copied to an average of 24 other projects (see Table 1).\n\n| Count | Total | % |\n|----------------|----------------|-----|\n| Reuse instances| 23,914,332,270 | - | - |\n| Blobs | 1,084,211,945 | 15,698,467,337 | 6.9% |\n| Originating projects | 31,706,416 | 107,936,842 | 29.4% |\n| Destination projects | 86,483,266 | 107,936,842 | 80.1% |\n\nNearly 32 million projects (about 30% of the nearly 108 million deforked OSS projects indexed by WoC) originated at least one reused blob. Over 86 million projects have copied these blobs, meaning 80% of OSS projects have reused blobs from another project at least once.\n\nRQ1-a Key Findings:\n1. We identified nearly 24 billion copy instances encompassing more than 1 billion distinct blobs.\n2. 6.9% of all the blobs in the entire OSS have been reused at least once.\n3. About 30% of all OSS projects originated at least one reused blob, and 80% of projects have reused blobs at least once.\n\nThe extensive reuse observed highlights the efficiency gains in OSS development, as projects benefit from existing code to accelerate development cycles and reduce costs. The widespread reuse also raises security concerns, as vulnerabilities in copied code can propagate across numerous projects. This necessitates improved vulnerability detection and management practices to ensure the integrity of reused code. Additionally, License violations due to improper code reuse can lead to legal challenges and compliance issues, underscoring the importance of clear licensing and adherence to open source policies. Furthermore, our identification of blob-level reuse, which only accounts for exact matches and not slight modifications, suggests that the actual extent of code reuse might be even higher. The findings advocate for the development of better tools and infrastructure to manage copy-based reuse, including automated detection of security and legal risks, and tools for maintaining code quality in reused components.\n\n5.1.2 RQ1-b: Is copy-based reuse limited to a particular group of projects? The numbers already demonstrate the prevalence of copy-based reuse in the OSS community. To understand how this reuse activity is distributed across different groups of projects, we constructed a contingency table as explained in the methods section. Each blob\u2019s originating project is unique and falls into one of three categories (big, medium, and small). However, downstream projects are not unique and we consider the largest downstream project for each blob.\n\nOur analysis revealed nearly 112 million unique blobs reused in our 640 million sample copy instances, with nearly 13 million of these blobs reused by at least one big project (see Table 2). This indicates that more than 11% of blobs are reused at least once by at least one big project, showing that copy-based reuse is not limited to small projects but is a widespread phenomenon in the OSS community.\nTable 2. Blob Counts in Reuse Sample\n\n| Biggest Downstream Projects | Total |\n|-----------------------------|---------------|\n| | Big Medium | Small | |\n| Upstream Projects | 6,748,621 | 22,273,811 | 6,515,122 | 35,537,554 (31.8%) |\n| Medium | 5,348,651 | 36,434,732 | 14,552,148 | 56,335,531 (50.3%) |\n| Small | 691,644 | 10,151,838 | 9,231,618 | 20,075,100 (17.9%) |\n| Total | 12,788,916 (11.4%) | 68,860,381 (61.5%) | 30,298,888 (27.1%) | 111,948,185 |\n\nHowever, it is still unclear if these reused blobs are predominantly introduced by big projects. If this were the case, one could presume that these blobs are mostly of good quality and not error-prone, making the costs of managing and tracking code propagation through such reuse potentially outweigh the benefits. Sampling copy instances revealed that big projects are responsible for only about 30% of reused blobs, while the remaining 70% are introduced by medium and small projects. Specifically, nearly 18% of these blobs are introduced by small projects, with the remaining 50% coming from medium projects. Furthermore, even for big projects, almost 50% of the blobs they reuse originate from medium and small projects (see Table 2). Therefore, it is evident that not only big projects serve as upstream sources for copy-based reuse. Indeed, many blobs introduced by medium and small projects are being widely reused.\n\nEven if all widely reused blobs were exclusively introduced by big projects, copy-based reuse still requires management for several reasons. For example, security vulnerabilities may continue to spread even after the main project has fixed the issue [78].\n\nRQ1-b Key Findings:\n\n1. 32% of reused blobs originate from big projects, which comprise 1% of the total projects.\n2. 18% of reused blobs originate from small projects, which make up 62% of the total projects.\n3. 50% of reused blobs originate from medium projects, which represent 37% of the total projects.\n4. Nearly 50% of blobs reused by big projects originate from medium and small projects, highlighting significant cross-category reuse.\n\nOur findings demonstrate that a non-negligible portion of reused code in the OSS community comes from medium and small projects, challenging the assumption that high-quality code predominantly originates from large projects. This implies a diverse quality spectrum in reused code and underscores the importance of ensuring quality and security across all project sizes, as vulnerabilities in smaller projects can propagate widely. Tools that can track the origin and usage of blobs are essential to ensure timely updates and fixes across the OSS ecosystem, mitigating risks associated with vulnerabilities and outdated code. The widespread nature of code reuse across projects of all sizes, emphasizes the need for quality assurance, effective management, and community collaboration to maintain the health and sustainability of the OSS landscape.\n\n5.1.3 RQ1-c: Do characteristics of the blob affect the probability of reuse? In this section, we first demonstrate the reuse trends, followed by the logistic regression model predicting the probability of a blob being reused. Additionally, we present the reuse propensity per language and show the difference in blob size between reused and non-reused blobs. Finally, we discuss a case study using JavaScript as an example.\n\n\\[ \\frac{5,348,651 + 691,644}{12,788,916} \\]\nReuse Trends. As explained in the methods section, we use a 2-year-limited copying definition in the RQ1-c and RQ1-d models and results. This means that we consider a blob reused only if it has been reused within 2 years of its creation. With this definition, 7.5% of blobs have been reused. Figure 2a shows the total counts of new blobs and copied blobs for each quarter since the year 2000\\textsuperscript{11}. Both counts exhibit rapid growth, although the growth in new blob creation appears to outpace that of copying. To investigate this difference, Figure 2b shows the reuse propensity measured via the reuse ratio (reused blobs divided by total blobs), confirming that new blob creation has outpaced copied blobs since 2006 when the ratio began to decline.\n\n\n\n\n\nFig. 2. Quarterly Reuse Trends\n\nLogistic Regression Model. We expect the nature of the blob to affect its propensity to be reused. To test this hypothesis, we use a logistic regression model where the response variable is set to one if the blob has been copied at least once (i.e., has been committed in at least two projects) within two years of its creation, and zero otherwise. We used WoC definition of the programming language associated with each blob and categorized less common programming languages in the sample as \u201cother\u201d. The descriptive statistics of the variables are presented in Table 3.\n\n| Variable | Statistics |\n|----------------|-------------------------------------------------|\n| Reused | Yes: 6,419,388 (7.5%) No: 78,136,705 (92.5%) |\n| Language (Counts) | JavaScript 11,122,849 Java 4,579,458 C 3,460,733 (Other) 65,393,053 |\n| Creation Time (Date) | 5% Median 7/29/2012 Mean 2/7/2018 95% 5/28/2017 |\n| Binary | Yes: 18,516,721 (21.8%) No: 66,039,372 (78.2%) |\n\n\\textsuperscript{11}The number of projects and blobs was much smaller before 2000.\nThe sample dataset is predominantly composed of blobs written in JavaScript, with significant counts also in Java and C. Additionally, the distribution of blob creation time is provided, showing a median date of February 7, 2018. Furthermore, a notable proportion of the blobs, 21.8%, are binary.\n\nThe results of our logistic regression model are shown in Tables 4 and 5. The model shows that the coefficients for all predictors are statistically significant with p-values less than 0.0001, meaning they impact the probability of a blob being reused (see Table 4).\n\n| Estimate | Std. Error | z value | Pr(>|z|) |\n|----------|------------|---------|---------|\n| (Intercept) | -18.0293 | 0.0186 | -967.07 | < 2 \u00d7 10^{-16} |\n| Binary | 0.4775 | 0.0010 | 460.16 | < 2 \u00d7 10^{-16} |\n| Creation Time | 0.8108 | 0.0010 | 828.34 | < 2 \u00d7 10^{-16} |\n| C | 0.7142 | 0.0017 | 426.32 | < 2 \u00d7 10^{-16} |\n| C# | -0.1277 | 0.0033 | -38.15 | < 2 \u00d7 10^{-16} |\n| Go | 0.3095 | 0.0065 | 47.74 | < 2 \u00d7 10^{-16} |\n| JavaScript | -0.0832 | 0.0015 | -56.21 | < 2 \u00d7 10^{-16} |\n| Kotlin | -0.5606 | 0.0133 | -42.02 | < 2 \u00d7 10^{-16} |\n| ObjectiveC | 0.0810 | 0.0066 | 12.30 | < 2 \u00d7 10^{-16} |\n| Python | -0.0327 | 0.0030 | -10.97 | < 2 \u00d7 10^{-16} |\n| R | 0.4070 | 0.0083 | 49.22 | < 2 \u00d7 10^{-16} |\n| Rust | 0.0879 | 0.0095 | 9.30 | < 2 \u00d7 10^{-16} |\n| Scala | -0.6168 | 0.0123 | -50.21 | < 2 \u00d7 10^{-16} |\n| TypeScript | 0.1827 | 0.0046 | 39.38 | < 2 \u00d7 10^{-16} |\n| Java | 0.0794 | 0.0019 | 42.37 | < 2 \u00d7 10^{-16} |\n| PHP | 0.3561 | 0.0024 | 151.14 | < 2 \u00d7 10^{-16} |\n| Perl | 0.7664 | 0.0082 | 92.95 | < 2 \u00d7 10^{-16} |\n| Ruby | -0.4782 | 0.0044 | -108.58 | < 2 \u00d7 10^{-16} |\n\nThe ANOVA table (Table 5) provides insights into the significance of different variables. We see that all the predictors have p-value equal to zero, meaning that the null hypothesis\\(^{12}\\) can be rejected. The null deviance is 45,438,151, which represents the deviance of a model with only the intercept. Adding the Binary variable reduces the deviance by 124,114, indicating its strong influence on reuse likelihood. The Creation Time variable further reduces the deviance by 830,322, highlighting its importance in predicting reuse. The \u201cLanguage\u201d variable also reduces the deviance by 230,614. Although these reductions might seem small relative to the null deviance, they are statistically significant given the large sample size and the high degrees of freedom involved.\n\nTo assess the direction and the size of predictor effects, we need to go further. In a logistic regression model, a positive coefficient estimate indicates that as the predictor variable increases, the odds of the outcome occurring increase, while a negative coefficient estimate indicates that as the predictor variable increases, the odds of the outcome occurring decrease. Since the coefficients represent the change in the log-odds of the outcome for a one-unit increase in the predictor, we transform these coefficients to odds ratios by exponentiating them to interpret the actual impact of each predictor. The odds ratio indicates how the odds of the outcome change with a one-unit increase in the predictor. The results are shown in Figure 3. This graph displays the odds ratios for\n\n\\(^{12}\\)H0: The reduced model (without the predictor) provides a fit to the data that is not significantly worse than the full model (with the predictor). This suggests that the predictor does not significantly improve the model\u2019s fit.\nTable 5. Blob-level Model - ANOVA Table\n\n| | Df | Deviance | Resid. Df | Resid. Dev | p.value |\n|------------------|----|------------|-----------|--------------|---------------|\n| NULL | | 84,556,092 | | 45,438,151.00| |\n| Binary | 1 | 124,114.20 | 84,556,091| 45,314,036.80| $< 2 \\times 10^{-16}$ |\n| Creation Time | 1 | 830,322.63 | 84,556,090| 44,483,714.17| $< 2 \\times 10^{-16}$ |\n| Language | 15 | 230,614.17 | 84,556,075| 44,253,100.00| $< 2 \\times 10^{-16}$ |\n\nvarious predictors in the logistic regression model at the blob level. An odds ratio greater than 1 indicates an increase in the likelihood of reuse, while an odds ratio less than 1 indicates a decrease.\n\nFig. 3. Blob-level Model - Logistic Regression Odds Ratios\n\nThe creation time has the highest positive coefficient. The time variable in the model represents the time elapsed from the blob\u2019s creation until current time, meaning that older blobs have higher time values. The positive coefficient indicates that newer blobs (with smaller time values) are less likely to be reused. This is not because they have been visible for a shorter duration (as we controlled for this with the time-bound definition of reuse), but likely due to other factors we hypothesized, such as fewer artifacts being available for reuse at the time of their creation.\nBinary blobs show a significant increase in reuse likelihood with an odds ratio of 1.63. Given this confirmed effect, we calculated the reuse propensity for binary and non-binary blobs separately. The results showed that 9.5% of binary blobs were reused, compared to 7.0% of non-binary blobs in our sample.\n\nDifferent programming languages show varied impacts on reuse likelihood. Blobs written in Perl, C, R, PHP, Go, TypeScript, Objective-C, Java, and Rust are more likely to be reused, with Perl showing the highest odds ratio. In contrast, blobs written in Kotlin, Scala, Ruby, C#, JavaScript, and Python are less likely to be reused, with Kotlin and Scala showing the most significant negative coefficients. This variability suggests that certain languages, perhaps due to their prevalence or specific use cases, are more conducive to code reuse.\n\n**Per-Language Propensity.** Following our logistic regression results, which demonstrated that programming language is a statistically significant factor in reuse probability of a blob, we calculated the propensity to copy for each programming language, measured as the percentage of reused blobs within that language (see Table 6). The results show that blobs written in Perl have the highest propensity to be reused at 18.5%, indicating a strong tendency for code reuse among Perl developers. Conversely, Kotlin has the lowest propensity at 3.0%, suggesting minimal code reuse in this language. Languages such as C (15.2%) and PHP (9.9%) also show high reuse rates, while Python (6.4%), JavaScript (5.5%), and TypeScript (6.3%) have lower rates. Other languages like Java (7.8%), Go (7.9%), and R (9.8%) fall in the middle range, with moderate reuse rates.\n\n| Language | Ratio | Language | Ratio | Language | Ratio |\n|------------|-------|------------|-------|------------|-------|\n| C | 15.2% | ObjectiveC | 8.4% | TypeScript | 6.3% |\n| C# | 6.0% | Python | 6.4% | Java | 7.8% |\n| Go | 7.9% | R | 9.8% | PHP | 9.9% |\n| JavaScript | 5.5% | Rust | 6.7% | Perl | 18.5% |\n| Kotlin | 3.0% | Scala | 3.8% | Ruby | 5.1% |\n\n**JavaScript Example.** The role of programming language in reuse activity might have several underlying reasons, as previously discussed. One such reason is the presence of a reliable package manager. If true, improvements in a package manager should reduce the propensity to reuse an artifact. To examine this, we analyzed the timeline of the reuse ratio for JavaScript, shown in Figure 4. The figure indicates a sharper decrease in the slope around 2010, the year the NPM package manager was introduced. This downward trend continues until mid-2013, when the copying activity rate drops to around 7% and then levels off. This pattern supports the hypothesis that the introduction and adoption of NPM significantly reduced code reuse through copying.\n\nHowever, it is important to note that this is just an illustration, and further research is needed to understand this phenomenon fully. Our current study was not focused on this aspect, so we did not conduct an in-depth analysis. Additional investigations with more data points and comparisons with other languages that have introduced similar improvements in their package management systems are necessary to confirm that the observed effect is not coincidental or specific to JavaScript alone.\n\n**Blob Size.** The final predictor we hypothesized to affect the reuse probability of a blob was its size. To investigate whether there is a significant difference between the sizes of copied and non-copied blobs, we conducted a t-test comparing these sizes. Our analysis revealed a significant difference (p-value < 2.2e-16), indicating that, on average, copied blobs are smaller than non-copied blobs.\n\nHowever, the effect varies by language. Specifically, per-language t-tests reveal that copied blobs are smaller in languages like JavaScript and TypeScript, larger in languages such as C and Python, and remain unchanged in\nObjective-C, as detailed in Table 7. For example, in JavaScript, the t-value is -59.9, suggesting that copied blobs are significantly smaller, while in C, the t-value is 195.9, indicating that copied blobs are larger. Similar patterns are observed in other languages, with TypeScript showing a t-value of -35.9 (smaller copied blobs) and Python a t-value of -5.8 (also smaller copied blobs). Conversely, languages like Java (t-value 120.7) and PHP (t-value 28.6) show that copied blobs tend to be larger.\n\nTable 7. Size Difference between Reused and non-Reused Blobs\n(Positive t value means larger reused blobs.)\n\n| Language | t value | p-value | Language | t value | p-value |\n|------------|---------|---------------|------------|---------|---------------|\n| C | 195.9 | $< 2 \\times 10^{-16}$ | Rust | -7.8 | $< 2 \\times 10^{-16}$ |\n| C# | 12.5 | $< 2 \\times 10^{-16}$ | Scala | 9.1 | $< 2 \\times 10^{-16}$ |\n| Go | 15.5 | $< 2 \\times 10^{-16}$ | TypeScript | -35.9 | $< 2 \\times 10^{-16}$ |\n| JavaScript | -59.9 | $< 2 \\times 10^{-16}$ | Java | 120.7 | $< 2 \\times 10^{-16}$ |\n| Kotlin | -14.5 | $< 2 \\times 10^{-16}$ | PHP | 28.6 | $< 2 \\times 10^{-16}$ |\n| ObjectiveC | 0.7 | 0.430298 | Perl | 5.8 | $< 2 \\times 10^{-16}$ |\n| Python | -5.8 | $< 2 \\times 10^{-16}$ | Ruby | -24.9 | $< 2 \\times 10^{-16}$ |\n| R | -7.6 | $< 2 \\times 10^{-16}$ | Other | -364.9 | $< 2 \\times 10^{-16}$ |\nThis variation highlights that the relationship between blob size and reuse propensity is complex and influenced by language-specific factors. While our findings demonstrate a general trend of smaller copied blobs, the differing patterns across languages suggest that other underlying factors may be at play.\n\n**RQ1-c Key Findings:**\n\n1. The reuse ratio is decreasing over time.\n2. 7.5% of blobs have been reused within two years of creation.\n3. Older blobs, when controlling for the confounding effect of increased visibility, are more likely to be reused.\n4. Binary blobs are 63% more likely to be reused.\n5. Programming languages significantly impact reuse likelihood. Blobs written in languages like Perl, C, R, PHP, Go, TypeScript, Objective-C, Java, and Rust are more likely to be reused, while those written in Kotlin, Scala, Ruby, C#, JavaScript, and Python are less likely to be reused.\n6. The reuse ratio timeline for JavaScript shows a notable decrease in slope around the year the NPM package manager was introduced.\n7. Copied blobs are generally smaller than non-copied blobs, but this is not consistent across different languages. The size difference varies by language, with reused blobs in C, Java, PHP, Go, C#, Scala, Perl, and Objective-C being larger than non-reused blobs, while in JavaScript, TypeScript, Ruby, Kotlin, Rust, R, and Python, the reused blobs are smaller than non-reused blobs.\n\nThe higher reuse propensity among binary blobs suggests that binaries are inherently more reusable, likely due to their compiled nature, which allows easy integration across projects. The lower reuse likelihood of newer blobs indicates a potential issue with the integration and acceptance of recent contributions, possibly due to rapid technological advancements and shifts in development practices. The significant impact of programming languages on reuse likelihood highlights the importance of language-specific tools and ecosystems. Languages with higher reuse rates, such as Perl and C, benefit from mature ecosystems, while newer or niche languages like Kotlin and Scala show lower reuse rates, potentially due to smaller communities. The decline in JavaScript code reuse post-NPM introduction suggests that improved package management can reduce the need for direct code copying, promoting more modular and maintainable codebases.\n\nRegarding blob size, the general trend indicates that smaller code artifacts are more reusable, likely due to their simplicity and ease of integration. However, this trend varies significantly across different programming languages. For example, in languages like JavaScript and TypeScript, copied blobs tend to be smaller, supporting the idea of writing concise and modular code to enhance reusability. In contrast, in languages like C and Python, copied blobs are often larger, suggesting that the nature and use cases of these languages might necessitate larger reusable components. This variation underscores the importance of understanding language-specific factors when considering code reuse management strategies.\n\n### 5.1.4 RQ1-d: Do characteristics of the originating project affect the probability of reuse?\n\nIn this section, we first present the logistic regression model. We then demonstrate the per-language reuse propensity and compare it to blob-level results. Finally, we analyze binary blob reuse.\n\n**Logistic Regression Model.** We applied a logistic regression model to determine the likelihood of a project introducing at least one reused blob. The response variable is binary: 1 if the project has introduced a reused blob, 0 otherwise. Descriptive statistics for the model variables are presented in Table 8. Consistent with blob-level data, the most frequent languages in our sample are JavaScript and Java.\nTable 8. Project-level Model - Descriptive Statistics\n\n| Variable | Description | Statistics |\n|----------|------------------------------------|---------------------|\n| Reused | Project has at least 1 reused blob | Yes: 205,140 (33.7%) No: 403,195 (66.3%) |\n| | 5% | Median | Mean | 95% |\n| Blobs | Number of generated blobs | 1 | 15 | 162.7 | 397 |\n| Binary | Binary blobs to total blobs ratio | 0 | 0 | 0.1 | 0.6 |\n| Commits | Number of commit | 1 | 5 | 57.0 | 84 |\n| Authors | Number of authors | 1 | 1 | 2.5 | 3 |\n| Forks | Number of forks | 0 | 0 | 1.5 | 1 |\n| Stars | Number of GitHub stars | 0 | 0 | 3.4 | 2 |\n| Time | Earliest commit time | 7/18/2013 | 3/26/2018 | 9/15/2017 | 3/3/2020 |\n| Activity | Total months project was active | 1 | 1 | 2.5 | 8 |\n| Language | JavaScript Java Python PHP C (Other) | 86,065 | 43,172 | 40,503 | 24,659 | 22,258 | 391,678 |\n\nSpearman\u2019s correlation analysis, suitable for the observed heavily skewed distributions, is presented in Table 9. The number of commits shows a high correlation with two other predictors: activity time (0.68) and the number of blobs (0.67). These high correlations indicate redundancy, as the number of commits does not add significant information beyond what is already captured by activity time and the number of blobs. This redundancy can lead to multicollinearity, potentially distorting the model\u2019s coefficients and reducing interpretability. Consequently, we remove the number of commits from the model, simplifying it without sacrificing explanatory power. All other correlations are below 0.52, which are not concerning.\n\nTable 9. Project-level Model - Spearman\u2019s Correlations Between Predictors\n\n| | Blobs | Binary | Commits | Authors | Forks | Stars | Time | Activity |\n|--------|-------|--------|---------|---------|-------|-------|------|----------|\n| Blobs | 1.00 | 0.46 | 0.67 | 0.34 | 0.22 | 0.22 | 0.09 | 0.52 |\n| Binary | - | 1.00 | 0.18 | 0.12 | 0.06 | 0.05 | 0.02 | 0.14 |\n| Commits| - | - | 1.00 | 0.45 | 0.27 | 0.26 | 0.05 | 0.68 |\n| Authors| - | - | - | 1.00 | 0.32 | 0.22 | 0.05 | 0.38 |\n| Forks | - | - | - | - | 1.00 | 0.48 | 0.14 | 0.28 |\n| Stars | - | - | - | - | - | 1.00 | 0.13 | 0.28 |\n| Time | - | - | - | - | - | - | 1.00 | 0.05 |\n| Activity| - | - | - | - | - | - | - | 1.00 |\n\nThe results for the project-level logistic regression model are shown in Tables 10 and 11. All the variables in the model have p-values less than 0.05, indicating that they are statistically significant in predicting the likelihood of a project introducing reused blobs (see Table 10). This demonstrates strong evidence against the null hypothesis, suggesting that these variables do have an effect on reuse.\n\nExamining the ANOVA results (Table 11) provides further insight into the impact and significance of these predictors. We see that all the predictors have p-value equal to zero, meaning that the null hypothesis can be rejected. The deviance values in the ANOVA table indicate the reduction in model deviance when each predictor is included. For example, adding the number of blobs to the model reduces the deviance by 131,219.53, a\nTable 10. Project-level Model - Coefficients\n\n| Estimate | Std. Error | z value | Pr(>|z|) |\n|----------|------------|---------|----------|\n| (Intercept) | -4.79 | 0.16 | -30.01 | < 2 \u00d7 10^{-16} |\n| Blobs | 0.61 | 0.00 | 228.94 | < 2 \u00d7 10^{-16} |\n| Binary | 0.77 | 0.02 | 40.09 | < 2 \u00d7 10^{-16} |\n| Authors | 0.09 | 0.01 | 8.24 | < 2 \u00d7 10^{-16} |\n| Forks | 0.31 | 0.01 | 27.72 | < 2 \u00d7 10^{-16} |\n| Stars | 0.06 | 0.01 | 7.19 | 6.61 \u00d7 10^{-13} |\n| Time | 0.10 | 0.01 | 12.00 | < 2 \u00d7 10^{-16} |\n| Activity | 0.07 | 0.01 | 10.48 | < 2 \u00d7 10^{-16} |\n| C | -0.33 | 0.02 | -19.60 | < 2 \u00d7 10^{-16} |\n| C# | -0.30 | 0.02 | -15.74 | < 2 \u00d7 10^{-16} |\n| Go | -0.29 | 0.04 | -7.70 | 1.33 \u00d7 10^{-14} |\n| JavaScript | 0.21 | 0.01 | 22.58 | < 2 \u00d7 10^{-16} |\n| Kotlin | -0.23 | 0.05 | -4.30 | 1.75 \u00d7 10^{-5} |\n| ObjectiveC | -0.13 | 0.03 | -3.63 | 0.000288 |\n| Python | -0.19 | 0.01 | -14.78 | < 2 \u00d7 10^{-16} |\n| R | -0.27 | 0.05 | -5.93 | 3.04 \u00d7 10^{-9} |\n| Rust | -0.48 | 0.07 | -6.65 | 2.87 \u00d7 10^{-11} |\n| Scala | -0.27 | 0.07 | -3.79 | 0.000153 |\n| TypeScript | 0.88 | 0.03 | 34.57 | < 2 \u00d7 10^{-16} |\n| Java | -0.25 | 0.01 | -20.90 | < 2 \u00d7 10^{-16} |\n| PHP | 0.29 | 0.01 | 19.59 | < 2 \u00d7 10^{-16} |\n| Perl | -0.31 | 0.10 | -3.20 | 0.001395 |\n| Ruby | 0.63 | 0.02 | 33.18 | < 2 \u00d7 10^{-16} |\n\nA substantial reduction that underscores its important role in the model. These results confirm the importance of these predictors in explaining the variability in the likelihood of reuse.\n\nTable 11. Project-level Model - ANOVA Table\n\n| Df | Deviance | Resid. Df | Resid. Dev | p.value |\n|----|----------|-----------|------------|---------|\n| NULL | 608,334 | 777,660.48 | | |\n| Blobs | 1 | 131,219.53 | 608,333 | 646,440.95 | < 2 \u00d7 10^{-16} |\n| Binary | 1 | 662.94 | 608,332 | 645,778.01 | < 2 \u00d7 10^{-16} |\n| Authors | 1 | 926.69 | 608,331 | 644,851.32 | < 2 \u00d7 10^{-16} |\n| Forks | 1 | 2,084.02 | 608,330 | 642,767.30 | < 2 \u00d7 10^{-16} |\n| Stars | 1 | 63.77 | 608,329 | 642,703.53 | 1.44 \u00d7 10^{-15} |\n| Time | 1 | 156.98 | 608,328 | 642,546.54 | < 2 \u00d7 10^{-16} |\n| Activity | 1 | 139.31 | 608,327 | 642,407.24 | < 2 \u00d7 10^{-16} |\n| Language | 15 | 5,178.20 | 608,312 | 637,229.03 | < 2 \u00d7 10^{-16} |\n\nTo understand the size and direction of the impacts, we look at the odds ratios inferred from the logistic regression coefficients. The odds ratio is calculated as the exponential of the coefficient. An odds ratio greater...\nthan 1 indicates a positive impact, while an odds ratio less than 1 indicates a negative impact. The results are shown in Figure 5.\n\n\n\nThe logistic regression analysis shows that several predictors significantly impact the likelihood of a project having a reused blob. TypeScript, Binary, Ruby, and Blobs have the strongest positive effects, indicating that increases in these variables substantially raise the odds of a project being reused. Other positive predictors include Forks, PHP, JavaScript, Time, Authors, Activity, and Stars, which also increase the likelihood, though to a lesser extent. Conversely, predictors like Rust, C, Perl, C#, Go, Scala, R, Java, Kotlin, Python, and Objective-C negatively impact the odds, suggesting that increases in these variables decrease the likelihood of a project introducing a reused blob.\n\nWhen interpreting the time variable, it is important to note that since the earliest commit timestamp is represented as a number, we calculated the time elapsed from the earliest commit to the current date for better interpretability. A larger time value indicates an older earliest commit. The model shows that time has a positive coefficient, suggesting that the older the earliest commit, the higher the probability of introducing reused blobs. This result could be influenced by two factors. First, at the blob-level model, we already observed that older blobs have a higher probability of being reused. Additionally, while the time-bound definition of reuse controls for the confounding effect of longer visibility at the blob level, it does not account for the longer visibility of the project itself. Therefore, the observed result might also be affected by the project\u2019s age, which implies longer visibility, even though the blob is reused within two years of its creation.\nPer-Language Propensity. The project-level model highlights the significance of programming languages in the likelihood of a project introducing a reused blob. To explore this further, we calculated the percentage of projects in each language that have introduced reused blobs. From our previous analysis (RQ1-a), we know that approximately 29% of projects introduced at least one reused blob. When using the time-bound definition of copying, this ratio increased to 33% in our sample. The results for each language are shown in Table 12.\n\n| Languages | Ratio | Language | Ratio | Language | Ratio |\n|-----------|--------|----------|--------|----------|--------|\n| C | 33.2% | ObjectiveC | 40.0% | TypeScript | 62.3% |\n| C# | 37.0% | Python | 30.5% | Java | 36.2% |\n| Go | 31.3% | R | 28.5% | PHP | 46.4% |\n| JavaScript| 41.2% | Rust | 31.5% | Perl | 29.9% |\n| Kotlin | 40.0% | Scala | 36.0% | Ruby | 51.2% |\n\nThe ratio of projects that have introduced reused blobs varies significantly across different programming languages, offering new insights compared to the blob-level analysis. For example, projects dominated by TypeScript have the highest probability (62%) of introducing at least one reused blob. This finding is particularly interesting because, at the blob level, the propensity to copy in TypeScript was lower than average. This discrepancy suggests that TypeScript projects, acting as upstream in the language\u2019s supply chain, are less centralized. Developers in this language seem more inclined to incorporate code from various, possibly unknown, projects.\n\nOther languages also show distinct patterns. For instance, Ruby projects have a high probability (51%) of reusing blobs, whereas Python projects have a lower probability (30.5%). This variation indicates that the likelihood of code reuse is strongly influenced by the primary language of the project, reflecting different practices and community norms across languages. These insights emphasize the importance of considering programming language when studying code reuse patterns in software projects.\n\nTo ensure these results are comparable to blob-level analysis, we calculated the copied blob ratio (copied blobs to total blobs) for each project and took the average of this ratio for projects in each language. An important difference here with the blob-level propensity is that at the blob level, language assignment was based on the file extension of each blob, with binary blobs categorized as \u201cOther\u201d. In this project-level analysis, the language of a blob is determined by the predominant language of the project it belongs to. For example, a Python-written blob in a C-dominated project is counted as a C blob. Similarly, binary blobs are assigned the language of the dominant language in their respective projects. The results of this new definition are shown in Table 13.\n\n| Language | Ratio | Language | Ratio | Language | Ratio |\n|----------|--------|----------|--------|----------|--------|\n| C | 15.4% | ObjectiveC | 9.5% | TypeScript | 5.6% |\n| C# | 4.7% | Python | 7.3% | Java | 5.8% |\n| Go | 6.7% | R | 7.2% | PHP | 9.5% |\n| JavaScript| 8.8% | Rust | 5.1% | Perl | 21.2% |\n| Kotlin | 3.4% | Scala | 3.5% | Ruby | 5.3% |\n\nThe propensity to copy varies when using this project-level definition compared to the blob-level definition (see Table 6).\nFor example, the propensity to copy in JavaScript-dominated projects is higher than for JavaScript blobs in general (8.8% vs. 5.5%). This indicates a greater likelihood of reuse within JavaScript projects compared to individual JavaScript blobs from various projects. This could be attributed to the modularity and strong reuse culture in the JavaScript ecosystem, where libraries and frameworks are frequently shared and integrated. JavaScript projects often incorporate multiple languages, such as HTML and CSS for web development or server-side languages for backend functionality, enhancing reuse through shared components. The evolution of JavaScript projects, involving various tools and libraries, also contributes to the higher reuse rate within the project context.\n\nIn Perl-dominated projects, the propensity to reuse is higher than for Perl blobs in general (21.2% vs. 18.5%). This suggests that blobs within Perl projects are more likely to be reused compared to individual Perl blobs from different projects. Perl\u2019s strong culture of code reuse and sharing, exemplified by the Comprehensive Perl Archive Network (CPAN), encourages the use and distribution of reusable code modules. Perl projects often include a wide range of scripts and utilities shared across different applications, enhancing reuse. Furthermore, Perl\u2019s use in scripting, text processing, and system administration often requires the reuse of common patterns and libraries, contributing to the higher reuse rate within projects.\n\nConversely, R-dominated projects show a lower propensity to reuse compared to R blobs in general (7.2% vs. 9.8%). This implies that individual R blobs are more likely to be reused than blobs within R-dominated projects. R is primarily used for statistical computing and data analysis, where specific scripts and functions are reused across different analyses. However, R projects are often tailored to specific datasets and analyses, resulting in lower overall reuse within the project context. The specialized nature of many R projects, with unique data processing and analysis pipelines, limits reuse compared to individual reusable components like functions and libraries.\n\nJava-dominated projects exhibit a lower propensity to reuse compared to Java blobs in general (5.8% vs. 7.8%). This indicates that individual Java blobs are more likely to be reused than blobs within Java-dominated projects. Java is widely used across various domains, and reusable components like libraries and frameworks are common across different projects. However, Java projects tend to be large and complex, with specific architectures and dependencies that may limit cross-project reuse. The high degree of customization and specificity in Java enterprise applications reduces the reuse rate within the project context compared to the reuse of individual Java blobs or libraries.\n\nThese analyses reflect the differing dynamics of code reuse in various programming ecosystems. Understanding these differences can help improve strategies for fostering code reuse and optimizing software development practices across different languages and project contexts.\n\n**Binary Blob Analysis.** Although previous analyses indicated that binary blobs are more likely to be reused, we aimed to investigate whether this propensity varies across projects dominated by different programming languages. At the blob level, it was not feasible to ascertain the programming language of a binary blob. However, at the project level, such analysis becomes possible. Therefore, we examined the reused binary blob ratio (the percentage of reused binary blobs to total reused blobs) within each language and compared it to the binary blob ratio (the percentage of binary blobs to total blobs) within the same language, utilizing a t-test to identify any significant differences.\n\nConsistent with the blob-level analysis, the reused binary blob ratio exceeds the general binary blob ratio across all programming languages, indicating a higher likelihood of reuse for binary blobs. This observation raises questions about language-specific differences in binary blob reuse. Specifically, we hypothesize that binary blobs are more frequently reused in certain languages compared to others. In other words, we want to know if identifying a reused binary blob allows us to infer that it is more likely to originate from projects written in particular languages.\n\nOur findings confirm this hypothesis, as the proportion of reused binary blobs varies significantly among different programming languages. Nevertheless, we hypothesize that at least some of this difference stems from\nthe general difference in binary blob ratios in different languages and is not limited to reuse. Our statistical tests reveal that the binary blob ratios indeed differ significantly across languages. Consequently, the ratio of reused binary blobs also exhibits significant variation among different languages, suggesting that this difference does not necessarily mean varying binary reuse practices among them.\n\nWe want to determine if the higher number of reused binary blobs in a certain language is solely due to the general prevalence of binary blobs in that language, or if some languages tend to reuse more binary blobs. To control for this confounding effect, we normalize the binary blob reuse ratio based on the total binary blob ratio. Given the binary blobs ratio $br$ in a project (binary blobs over total blobs), we defined the reused binary ratio $cbr$ (binary reused blobs to total reused blobs) to binary ratio $br$ metric. This metric ($cbr/br$) averaged 4.104 for all the projects in our sample. By using a linear regression with the project\u2019s primary language as a predictor, we obtained the results shown in Table 14.\n\n$$m = \\frac{cbr}{br} = \\frac{cbc/cc}{bc/c}$$\n\n$m$: normalized binary reuse metric \n$cbr$: copied binary ratio \n$br$: binary ratio \n$cbc$: copied binary count \n$cc$: copied count \n$bc$: binary count \n$c$: total count\n\n| Language | Metric | p-value | Language | Metric | p-value |\n|----------|--------|---------|----------|--------|---------|\n| C | 3.33 | 0.810722| Rust | 6.06 | 0.422024|\n| C# | 4.92 | 0.025270| Scala | 5.38 | 0.545028|\n| Go | 5.73 | 0.173372| TypeScript| 5.17 | 0.063922|\n| JavaScript| 7.04 | $< 2 \\times 10^{-16}$| Java | 4.91 | 0.000497|\n| Kotlin | 5.42 | 0.306698| PHP | 4.49 | 0.035326|\n| ObjectiveC| 2.17 | 0.217673| Perl | 3.32 | 0.975449|\n| Python | 2.19 | 0.005547| Ruby | 3.51 | 0.951277|\n| R | 2.65 | 0.614773|\n\nOur analysis reveals that the reused binary blobs to binary blobs metric varies across programming languages. Notably, C#, JavaScript, Python, Java, and PHP exhibit statistically significant differences (p-value < 0.05). In particular, JavaScript projects demonstrate a higher tendency to reuse binary blobs, while Python projects show a lower tendency. This suggests that in JavaScript-dominated projects, reusing binary blobs is likely more efficient and cost-effective than reusing code. Conversely, Python projects might benefit more from reusing code rather than binary blobs.\n\nThe complete coefficients and regression ANOVA tables are available in the online appendix.\nRQ1-d Key Findings:\n\n1. Project properties significantly impact the probability of their blobs being reused, with binary ratio, number of blobs, forks, authors, activity duration, and stars having a positive impact.\n2. Older projects are more likely to have introduced reused blobs.\n3. Blobs residing in projects dominated by different programming languages have varying probabilities of reuse, with TypeScript, Ruby, PHP, and JavaScript having higher probabilities, and Rust, C, Perl, C#, Go, Scala, R, Java, Kotlin, Python, and Objective-C having lower probabilities.\n4. On average, 33.7% of projects have introduced at least one reused blob, but this percentage varies significantly between languages, with TypeScript (62.3%) and Ruby (51.2%) having the highest propensity, and R (28.5%) and Perl (29.9%) the lowest.\n5. The tendency to reuse binary blobs is much higher in JavaScript projects, while Python projects show a lower tendency.\n\nThe project-level analysis reveals that various factors significantly influence the likelihood of code reuse in open source software projects. Projects with more blobs, binary blob ratio, and longer activity tend to exhibit higher reuse rates. This aligns with our hypothesis that project health, activity, and popularity signals play an important role in promoting reuse.\n\nThe variation in reuse likelihood across different programming languages underscores the influence of language-specific ecosystems and practices, consistent with blob-level results. For instance, TypeScript and Ruby projects show the highest propensity for reuse, which may be due to their robust ecosystems and strong community practices that encourage code sharing and reuse. Conversely, languages like Python and Perl have lower reuse rates, suggesting different reuse dynamics and possibly a need for improved tools and practices to foster reuse. However, the impact between the blob\u2019s language and the language of the project it resides in differs. This suggests that the underlying factors behind these differences are not just technical aspects of the languages and their tools, but also their community culture and practices.\n\nThe significant reuse of binary blobs, particularly in languages like JavaScript, indicates that binary artifacts are valuable assets in software projects. This might be due to the efficiency and ease of integrating precompiled binaries compared to source code. However, the lower reuse rate of binary blobs in Python suggests that this language\u2019s ecosystem favors source code reuse, which could be due to its dynamic nature and the extensive use of interpreted scripts. These findings have important implications for the development and support of tools that facilitate reuse in different programming languages. For languages like JavaScript, where binary blob reuse is prevalent, enhancing asset libraries could be beneficial. In contrast, for languages like Python, where code reuse is more advantageous, improving code package managers would be more appropriate. This differentiation underscores the necessity for tailored support tools to optimize reuse practices in various programming environments.\n\nThese findings highlight the impact of project context on reuse patterns and suggest that different definitions and granularity levels can yield varying insights into code reuse behaviors.\n\n5.2 RQ2: How do developers perceive and engage with copy-based reuse?\n\nAcross three rounds, we received 247 complete responses from reusers and 127 from creators. There were also 360 and 178 partial responses, making the total of 607 and 305 responses from reusers and creators respectively. The results are shown in Table 15.\n\nAs will be discussed in Section 7.1.2, the identified originating repository might not always be the true creator of the blob. 39% of developers identified as creators reported reusing the blob from another source. Additionally, reusers might have obtained the blob from another reuser and not the original creator (see Section 7.1.3). Among\nTable 15. Survey Participation\n\n| | Total | Started | Completed | Response Rate | Completion Rate |\n|--------|-------|---------|-----------|---------------|-----------------|\n| Creator| 3,144 | 305 | 127 | 9.70% | 4.04% |\n| Reuser | 6,338 | 607 | 247 | 9.58% | 3.90% |\n| Total | 9,482 | 912 | 374 | 9.62% | 3.94% |\n\nthe reusers who confirmed reusing the blob, 43% acknowledged the originating project as the source, 48% reported copying it from elsewhere, and 9% did not answer the question.\n\nThese findings provide important estimates: the fraction of reuse within open source software (OSS) is at least 61%, and the fraction of reuse from originating projects is at least 43%. This data is essential for understanding the dynamics of code reuse within OSS, highlighting the significance of both direct reuse from original projects and secondary reuse through intermediate projects.\n\nFurthermore, only 60% of those identified as reusers confirmed reusing the blob, while the remaining 40% claimed to have created it (see Table 16). This discrepancy can be attributed to several factors. First, some individuals might indeed be the original authors of the blob in the originating project, implying they have reused their own resources. Second, this gap could be explained by activities in private repositories (e.g., Developer A creates a file in a private repository, Developer B copies it to a public repository, and then Developer A reuses it in another public repository). Third, as mentioned in Section 4.3, concerns about potential licensing violations might have made many reusers uncomfortable admitting the reuse explicitly. Additionally, developers\u2019 faulty memory could play a role, especially for reuse instances that occurred a long time ago.\n\nOne potential area for further investigation could be examining the project owners and commit authors for each copy instance to gain a better understanding of this gap. However, this was not pursued further in this study as it was not the main focus. Exploring these factors in future research could provide deeper insights into the complexities of code reuse and attribution within open source software projects.\n\nTable 16. Identified vs. Claimed Creators & Reusers\n\n| Identified | Creators | Reusers | Total |\n|------------|----------|---------|-------|\n| Claimed | Creator | 77 (61%)| 99 (40%)| 176 |\n| | Reuser | 50 (39%)| 148 (60%)| 198 |\n| Total | 127 | 247 | 374 |\n\nAnother dimension of the survey explored the intentions of creators for others to reuse their artifacts. Sixty-two percent of creators indicated that their resources were intended for reuse by others. When asked about the helpfulness of the particular blob on a scale from 1 to 5 (with 5 being the most helpful), reusers rated the average helpfulness at 3.81, while creators rated it at 4.24. This suggests that developers are well aware of the reuse potential of their artifacts, even if the blob may be essential primarily for their own projects.\n\nIn the background sections, we discussed the risks associated with this type of reuse. We asked reusers if they were concerned about these risks as well. On a scale from 1 to 5 (with 5 being the most concerned), the average concern about bugs in the reused file was 1.83, and the average concern about changes in the original file was 2.35. Several factors might contribute to the low level of concern among developers, including trust in the original code\u2019s quality or confidence in their own testing processes. However, this lack of concern could facilitate the spread of potentially harmful code, even if the creator fixes the original code. The fact that reusers are not significantly worried about these risks amplifies the potential risk at the OSS supply chain level.\nNext, we asked participants how likely they would be to use a package manager if one were available for the particular blob. On a scale from 1 to 5 (with 5 being the most likely), the average likelihood of using a package manager was 2.93. This indicates that although developers may not be very concerned about bugs or changes (potential improvements), many would still use such a tool if it were available. This suggests that \u201cpackage-manager\u201d type tools for refactoring or at least maintaining reused code might gain traction if developed. These results are shown in Table 17.\n\n| Question (audience) | Responses | Average | Median | StdDev |\n|--------------------------------------|-----------|---------|--------|--------|\n| How helpful? (creators) | 156 | 4.25 | 5 | 1.15 |\n| How helpful? (reusers) | 185 | 3.82 | 4 | 1.32 |\n| Concern about bugs? (reusers) | 185 | 1.85 | 1 | 1.33 |\n| Concern about changes in the original file? (reusers) | 187 | 2.33 | 2 | 1.56 |\n| Likelihood of using a package manager? (reusers) | 184 | 2.89 | 3 | 1.64 |\n\nFinally, the thematic analysis of reasons for reuse, specifically responses to the question \u201cwhy\u201d, revealed eight themes from the 162 responses we received (see Table 18). This analysis provides a nuanced understanding of the motivations behind code reuse, highlighting several key themes.\n\n| Theme | Description | Frequency |\n|-------|------------------------------|-----------|\n| Demo | demonstration, test, prototype | 14 |\n| Dependency | part of a library | 11 |\n| Education | learning purposes | 16 |\n| Functionality | specific functionality | 39 |\n| Own | own reuse | 2 |\n| Resource | image, style, dataset, license | 30 |\n| Template | template, starting point, framework | 14 |\n| Tool | parser, plugin, SDK, configuration | 23 |\n\nAs expected, one of the main reasons for reuse was to provide specific functionality. This indicates that developers often reuse code to incorporate existing functionalities into their projects, saving time and effort in development, a practice well-documented in the literature [48]. This underscores the importance of reusable components in efficient software development.\n\nAnother observed theme was the reuse of various resources, including datasets, instructions, license files, and graphical or design objects (e.g., PNG, JPEG, fonts, styles). This aligns with the significant reuse of binary blobs identified in RQ1. The inclusion of diverse resources indicates that developers often depend on readily available materials to enhance their projects\u2019 visual or functional aspects. While the literature acknowledges this practice, our findings suggest a slightly higher emphasis on resource reuse. This indicates that resource management might be more important for developers than previously thought.\n\n---\n\n14Since survey participants were chosen through stratified sampling, these frequencies do not represent the actual data distribution.\nReusing tools such as parsers, plugins, SDKs, and configuration files was mentioned 23 times. This practice is noted for its practicality and efficiency in setting up development environments and ensuring consistency across projects. This highlights the role of auxiliary software components in streamlining development processes and providing necessary infrastructure or functionality.\n\nAssignments, school projects, learning objectives, and similar concepts were another prominent theme. This emphasizes the role of code reuse in the software development knowledge supply chain, as developers reuse existing code to understand and learn new concepts.\n\nCode reuse for demonstration, testing, and prototyping purposes was identified 14 times. This theme suggests that developers often reuse code to quickly create prototypes or test scenarios without focusing on the quality, security, or licensing of the reused code. The priority in these cases is to achieve rapid results. This aligns with the findings by Juergens et al. [48], that developers often clone code to create prototypes and perform tests. Some of these quick prototypes, however, may end up as active projects.\n\nTemplates, starting points, and frameworks were mentioned 14 times. Developers often clone templates or frameworks to have a solid foundation for their projects, a practice supported by findings of Roy and Cordy [80]. This approach leverages existing structures to expedite development and ensure consistency.\n\nPart of a library or dependency management was cited 11 times. This practice is highlighted in studies that emphasize the importance of managing dependencies within the development process, such as the study by Roy and Cordy [80]. Although checking in library files is not considered best practice, many developers do so to maintain specific versions and avoid potential issues with updates or changes. This conscious decision highlights a trade-off between best practices and practical needs.\n\nReusing one\u2019s own code was mentioned twice. The theme of \u201cown reuse\u201d where developers clone their own code for reuse in new projects, is less prominently featured in the literature compared to other reasons for code cloning. Developers clone their own code to ensure consistency, save time, and leverage previously written and tested code. This practice is practical and efficient, especially when developers are familiar with the code and its functionality. However, the literature does not emphasize this reason as strongly. While studies acknowledge the broader concept of code reuse, their focus is more on reusing code from external sources, libraries, or for educational purposes [48, 80]. This discrepancy suggests that \u201cown reuse\u201d might be an underexplored area in existing research. It indicates that while developers recognize and practice it frequently, it may not be as thoroughly documented or emphasized in the academic literature. This gap highlights an opportunity for further investigation into how and why developers engage in \u201cown reuse\u201d and its impact on software development processes.\n\nThere were also 13 instances where responses were either incomprehensible or the respondent did not remember the file or the reason for reuse.\nRQ2 Key Findings:\n\n1. 39% of identified creators stated they reused the blob from another source.\n2. Among reusers, 43% acknowledged the originating project (direct reuse), while 48% copied from elsewhere (indirect reuse).\n3. Reuse within the OSS landscape is at least 61%.\n4. 60% of reusers confirmed reuse; 40% claimed creation.\n5. 62% of creators intended their resources for reuse.\n6. Reusers are not very concerned about potential bugs or changes in the original file.\n7. Reusers are willing to use a package manager if available.\n8. Main reuse themes are: functionality, resources, tools, education, demo/testing/prototyping, templates, dependencies, and own reuse.\n\nThe findings reveal that a non-negligible portion of developers engage in copy-based reuse within the OSS community. This practice is common, with many reusers sourcing code not directly from the original creators but through intermediaries. Understanding these dynamics is important for improving the transparency and traceability of reused code, which could potentially enhance code quality and security.\n\nThe discrepancies between identified and claimed creators highlight complexities in attribution and ownership. Additionally, survey respondents\u2019 replies are not always accurate or true, which further complicates understanding the true origins of code. This gap underscores the need for better tracking mechanisms within repositories to accurately reflect code origins. Future research could delve deeper into these factors, offering insights that could inform policy and tooling improvements in OSS development.\n\nCreators often intend their code to be reused, and both creators and reusers recognize the utility of such artifacts. This positive perception suggests that promoting reuse can be beneficial for the community, fostering collaboration and innovation. However, the difference in helpfulness ratings indicates that there might be room for improving the clarity and documentation of reusable code to better meet reusers\u2019 needs.\n\nDespite the low concern about potential risks like bugs and changes, the moderate interest in package management tools suggests an opportunity for developing solutions that can help maintain and refactor reused code. Such tools could mitigate risks by providing updates and improvements in a managed manner, enhancing the overall reliability of reused code.\n\nThe thematic analysis of reuse motivations provides a comprehensive view of why developers opt for copy-based reuse. Reusing for specific functionality underscores the importance of modular and reusable code in software development. It also highlights the potential benefits of well-documented and easily integrable code components that can be readily reused by others.\n\nThis practice of including library files suggests a deliberate effort to maintain stability and avoid the uncertainties that might come with updates or changes. However, it also highlights a potential area for improvement in developer education and best practices, as well as the importance of tools that can help manage dependencies more effectively. These insights contribute to our understanding of the motivations behind code reuse and the practical considerations developers face in maintaining their projects.\n\nWhile reusing for demo and testing can accelerate development and innovation, it also raises potential risks. Developers may inadvertently propagate vulnerabilities or violate licenses, leading to broader issues within the software supply chain. Highlighting the importance of balancing speed and security during testing phases can inform best practices and educational efforts.\n\nEducational use underscores the educational value of code reuse. Reusing existing code allows learners to understand real-world applications and coding practices, fostering skill development. However, it also emphasizes\nthe need for proper guidance and resources to ensure that educational reuse is done ethically and effectively. Encouraging educators to integrate lessons on best practices in code reuse can enhance the quality of learning and adherence to legal and ethical standards.\n\nThe proportion of no meaningful answers and not recalling the file, indicate that not all reuse instances are well-documented or remembered by developers. This lack of clarity can hinder the understanding and traceability of reuse practices. It highlights the need for better documentation and tracking mechanisms to ensure that the reasons and contexts for reuse are transparent and well-understood. Implementing such measures can improve the management of reused code and resources, reducing potential risks associated with undocumented reuse.\n\n6 IMPLICATIONS\n\n6.1 For Developers\n\nCopy-based reuse enables developers to save time and effort by leveraging existing code. However, it introduces risks such as maintenance fragmentation, security vulnerabilities, and outdated dependencies. To address these challenges, developers should adopt tools and practices to track reused code, ensure compliance with licensing requirements, and mitigate risks associated with unverified code quality.\n\nFostering a practice of systematically reviewing and documenting reused code not only enhances its reliability and maintainability, but also contributes to the overall sustainability of software projects. Additionally, staying informed about updates to reused code and integrating these updates promptly can further reduce risks associated with outdated or insecure components.\n\n6.2 For Businesses\n\nBusinesses that rely on open source software must proactively address the inherent risks of copy-based reuse, including security vulnerabilities and potential non-compliance with licensing terms. Investing in robust tools for tracking and maintaining reused code is critical to safeguarding the software supply chain. This effort should encompass implementing workflows for regularly updating and reviewing reused components.\n\nMoreover, businesses should actively support smaller open source projects that provide valuable code contributions. Such support not only enhances the quality and reliability of business-critical software, but also fosters goodwill and collaboration within the open source community. By taking these steps, businesses can effectively mitigate risks while strengthening the ecosystem upon which they rely.\n\n6.3 For the Open Source Community\n\nThe open source community plays an important role in ensuring the safe and effective reuse of code. By promoting best practices for ethical and secure reuse, such as adopting standardized licensing and improving quality benchmarks, the community can minimize risks and build trust in shared resources. Equally important is supporting small and medium-sized projects that contribute significantly to the reusable code base. Providing mentorship, funding, and collaboration opportunities can bolster the overall open source ecosystem, fostering innovation and cooperation across projects.\n\nAdditionally, establishing centralized repositories or resources that facilitate traceability and offer detailed metadata on provenance, authorship, and licensing can streamline the reuse process and mitigate associated risks. These efforts collectively enhance the reliability, sustainability, and scalability of open source software.\n\n6.4 For Researchers and Educators\n\nResearchers have a unique opportunity to investigate finer-grained reuse patterns, such as instances involving slight modifications or partial reuse, to better understand the factors influencing reuse and its long-term impact.\non software quality and security. Such insights can guide the development of tools and methodologies that promote safe and effective reuse practices.\n\nEducators should integrate lessons on ethical reuse practices, licensing compliance, and dependency management into software engineering curricula. By leveraging real-world case studies and addressing practical challenges, such as balancing development speed with security concerns, educators can equip future developers to navigate the complexities of software reuse responsibly. This approach will help ensure that the next generation of software professionals actively supports the sustainability and growth of open source ecosystems.\n\n6.5 For OSS Platform Maintainers\n\nPlatforms like GitHub and GitLab are well-positioned to enhance practices surrounding copy-based reuse. Improving traceability mechanisms to preserve provenance, authorship, and licensing metadata is essential for minimizing risks such as unintentional license violations and outdated dependencies. Integrating features for automated detection of license conflicts, dependency vulnerabilities, and changes in reused code can further empower developers to manage their projects efficiently and securely.\n\nAdditionally, platforms can offer educational resources and in-platform guidance to encourage best practices for reuse and compliance. By fostering a culture of informed and collaborative reuse, platform maintainers can contribute significantly to the long-term sustainability and resilience of the open source ecosystem.\n\n7 LIMITATIONS\n\n7.1 Internal Validity\n\n7.1.1 Commit Time. The identification of the first occurrence and consequently building the reuse timeline of a blob is based on the commit timestamp. This time is not necessarily accurate as it depends on the user\u2019s system time. The dataset we utilized followed suggestions by Flint et al. [22] and other methods to eliminate incorrect or questionable timestamps. This increases the reliability of our reuse timeline. We also used version history information to ensure the time of parent commits does not postdate that of child commits [46]. This adds an extra layer of consistency and validation, further enhancing the accuracy of our data.\n\n7.1.2 Originating Project. The accuracy of origination estimates is highly reliant on the completeness of data. Even if we assume that the World of Code (WoC) collection is exhaustive, it is possible that some blobs may have originated in a private repository before being copied into a public one. This means that the originating repository in WoC may not be the actual creator of the blob. This scenario suggests that even with a comprehensive dataset, there could be instances of code reuse that remain undetected, adding another layer of complexity to understanding the full extent of reuse across open source projects. For example, a 3D cannon pack asset\\(^\\text{15}\\) was committed by 38 projects indexed by WoC. However, that asset was originally created earlier in the Unity Asset Store [46].\n\nBy utilizing the extensive WoC collection, we provide a broad and detailed analysis of code reuse, capturing a significant portion of open source activity even if some instances of private-to-public transitions are missed. Additionally, the examples we identified, such as the 3D cannon pack asset, highlight the practical implications and real-world relevance of our findings, demonstrating the robustness of our analysis despite potential data gaps. Our approach addresses the inherent challenges of tracking code origination and reuse, offering a framework that can be refined and expanded in future research to further improve accuracy and comprehensiveness.\n\n7.1.3 Copy Instance. A unique combination of blob, originating project, and destination project might not always accurately represent the actual pattern of reuse. This is because some destination projects could potentially reuse the blob from a different source other than the originating project. For instance, if we have three projects\u2014A, B,\\(^\\text{15}\\)https://assetstore.unity.com/packages/3d/props/weapons/stylish-cannon-pack-174145\nand C\u2014in order of blob creation, project C might copy from either project A or B. Additionally, certain blobs are not reused but are created independently in each repository, such as an empty string or a standard template automatically generated by a common tool [46]. These blobs are excluded by using the list provided by WoC [62].\n\nDespite this limitation, our results remain significant. By recognizing the potential for indirect reuse and independently created blobs, we provide a more nuanced understanding of the reuse landscape, accounting for the complexity of code propagation across projects. Excluding independently created blobs and utilizing WoC\u2019s comprehensive list ensures that our analysis focuses on genuine reuse instances, enhancing the reliability of our findings.\n\n7.2 External Validity\n\n7.2.1 Blob-level Reuse. Our work focuses solely on the reuse of entire blobs, deliberately excluding the reuse of partial code segments within files. While blob-level reuse is common, it only covers a subset of the broader code reuse landscape. Blob-level reuse is more relevant to scenarios where larger code blocks, consisting of entire files or even groups of files, are reused compared to statement or function-level reuse. This means that our results might have an implicit bias towards programming languages or ecosystems that rely more heavily on complete files, potentially overlooking reuse practices prevalent in languages that favor modular or snippet-based reuse.\n\nThis limitation also implies that different versions of the same file, even if they differ by just one character, generate different blobs due to distinct file hashes. Consequently, blob reuse does not equate to file reuse. Defining file reuse is challenging because it is difficult to determine what constitutes equivalence between files in different projects [46]. This could be a potential reason for the higher level of reuse in binary blobs, as they are relatively harder to modify.\n\nDespite these limitations, our results remain significant for several reasons:\n\n- **Prevalent Pattern**: By concentrating on entire blob reuse, we address a prevalent and impactful pattern in software development. This allows us to provide valuable insights into a substantial portion of code reuse practices.\n- **Clarity and Precision**: Analyzing entire blobs offers a clear and precise method for identifying reuse, avoiding the ambiguity and complexity associated with defining partial file reuse. This clarity enhances the reliability of our findings.\n- **Efficiency and Scalability**: Blob-level analysis is computationally efficient and scalable, enabling us to process large datasets and draw meaningful conclusions from extensive data. This scalability is important for comprehensive empirical studies.\n- **Foundation for Future Research**: Our work lays the groundwork for future studies that can build on our findings to explore partial file reuse and other nuanced aspects of code reuse. By addressing a well-defined scope, we provide a solid foundation for subsequent research.\n\nIn summary, while our focus on blob reuse introduces certain limitations, it also provides clear, scalable, and impactful insights into code reuse practices. This targeted approach enables us to contribute valuable findings to the field, despite the inherent complexities of defining and analyzing file reuse. Although blob-level reuse is less granular than statement or method-level reuse, findings at the blob level would also apply to sub-blob-level analysis, which should adjust for blob-level reuse. Future studies are needed to investigate the extent to which different levels and types of code reuse overlap or differ.\n\n7.2.2 Survey Response Rate. The relatively low response rate to our survey may have been due to the perception of the respondents that copying code is a sensitive subject. These concerns may have impacted the responses even in cases when developers chose to participate. It suggests that further work may be needed to design surveys that do not create such impressions.\nAdditionally, since many of these reuse instances happened a long time ago, developers might have forgotten about them. Therefore, it is important to conduct regular surveys to capture the experiences while developers still remember their practices.\n\n8 FUTURE WORK\n\n8.1 Code-Snippet Granularity\n\nWe discussed in methodology section that going to a finer granularity than blob-level to detect code reuse is not practically feasible. Nevertheless, there are approaches that can make this a relatively more tractable problem. Specifically, hashing the abstract syntax tree (AST) for each code snippet (such as classes or functions) in a blob and mapping blobs to these hashes could potentially make finer-grained code reuse detection more feasible.\n\nAssuming an average of $k$ code snippets for each of the 16 billion blobs, the parsing and hashing operation has a complexity of $O(n)$, resulting in $O(16 \\times 10^9 \\times k)$. We can then perform a self-join on the created map of blob to syntax tree hash (b2AST) using the AST hash as the key. The self-join complexity depends on the number of unique hashes and their distribution. In the worst case, if every blob had unique hashes, the join operation would approach $O((16 \\times 10^9 \\times k)^2)$. However, the join complexity would typically be significantly less if there are many common hashes. A more realistic estimate assumes that the number of unique AST hashes $h$ is much smaller than the total number of entries in the b2AST map, making the join complexity closer to $O(h \\times 16 \\times 10^9 \\times k)$. This join, although potentially large, can be more feasible than pairwise comparisons of entire blobs due to the more efficient handling of common hashes.\n\nBy examining code reuse at the granularity of code snippets, we could potentially uncover a far more intricate network of reuse. This approach might reveal patterns and practices that are not noticeable when looking solely at whole-file or blob-level reuse. Although this increased complexity is challenging to manage, it offers valuable opportunities for a more comprehensive analysis of reuse [46].\n\n8.2 Dependency-Based Reuse\n\nIn this work, we aimed to demonstrate the prevalence and importance of copy-based reuse. To gain a comprehensive understanding of code reuse, it is important to analyze both copy-based and dependency-based reuse. Each type of reuse reveals different aspects of how software developers leverage existing code in their projects. By studying them side by side, we can paint a more complete picture of the extent and nuances of reuse in software development. Ignoring one in favor of the other would provide an incomplete narrative [46].\n\n8.3 Upstream Repository\n\nAs highlighted in the limitations section, we currently lack precise knowledge about the source from which a repository reuses a file. We tend to assume it is from the originating repository in all instances of copying. However, this assumption may not capture the real-world complexity of reuse. To enhance our understanding of how developers identify suitable repositories for reuse, we could potentially leverage meta-heuristic algorithms or artificial intelligence techniques. These advanced methods might enable us to predict the actual source of reused artifacts in each instance of copying with greater accuracy [46].\n\n8.4 Open Source Software Supply Chain Network\n\nDirected Acyclic Graphs (DAGs) have been instrumental in clone detection and reuse literature due to their ability to model and analyze complex relationships and dependencies between various software components.\nIn the context of copy-based reuse, the dataset created using the World of Code (WoC)\\textsuperscript{16} infrastructure can be leveraged to construct DAGs that represent the flow and reuse across different repositories.\n\nThe dataset\u2019s detailed tracking of blob copies, including their origins and destinations, provides a rich source of data to map these relationships accurately. By drawing DAGs, researchers can visualize and analyze the propagation of reused blobs, identifying critical nodes (projects or blobs) that play a central role in the reuse network. This visualization helps in understanding the structure and dynamics of reuse, highlighting patterns such as the most reused blobs, the central projects in the reuse network, and potential vulnerabilities or licensing issues propagating through these reused blobs.\n\nDAGs can reveal how reuse spreads across projects, helping to identify which projects are the primary sources of reusable blobs and how code flows between different projects. By mapping out the reuse network, it is possible to pinpoint critical points where vulnerabilities or licensing issues could propagate, allowing for targeted interventions to mitigate these risks. Understanding the reuse network also aids in developing better tools and practices for managing code quality and ensuring that reused code is maintained and updated consistently across all projects that use it.\n\nStudies on large-scale clone detection such as Sajnani et al. [83] and Koschke [52] provide foundational methodologies for leveraging DAGs in these contexts. These methodologies can be adapted and extended using our dataset to enhance the understanding of copy-based reuse in open source software development.\n\n8.5 Tool Development\n\nAs discussed in the background section, different types of code reuse can have impacts on several critical areas, including security, licensing, and code quality. Understanding these implications and addressing them is important for advancing software development practices.\n\n**Security.** Reused code can propagate vulnerabilities across multiple projects [78]. For instance, if a security flaw exists in a reused blob, it can potentially affect all projects that include this blob. Analyzing the reuse patterns can help identify critical points where vulnerabilities might spread and allow for proactive mitigation measures. There have been notable incidents where widespread code reuse led to security breaches. For example, the Heartbleed bug in OpenSSL had far-reaching impacts due to the extensive reuse of the affected code across numerous projects. Future research can focus on developing automated tools that scan reused code for known vulnerabilities and suggest patches. This proactive approach can enhance the security posture of software systems.\n\n**Compliance.** Reused code may carry licensing obligations that need to be respected. Failure to comply with these obligations can lead to legal disputes and financial penalties. By understanding reuse patterns, organizations can ensure they meet licensing requirements. There have been instances where companies faced legal challenges due to improper reuse of code with restrictive licenses. For example, using GPL-licensed code in a proprietary software without complying with GPL terms has led to lawsuits. Developing tools that automatically check for license compliance when code is reused can help organizations avoid legal pitfalls. These tools can flag potential issues and provide guidance on how to resolve them.\n\n**Code Quality.** Reused code may not always meet the quality standards of the adopting project. Ensuring that reused code adheres to best practices and coding standards is essential for maintaining overall code quality. Poorly written code can lead to maintenance challenges and degraded performance in adopting projects. Future work can focus on creating tools that assess the quality of reused code and suggest improvements. These tools can analyze code for adherence to coding standards, detect code smells, and recommend refactoring.\n\n\\textsuperscript{16}For more information about how to access this data, please visit: https://github.com/woc-hack/tutorial.\nPackage Managers. Developing package managers tailored for different programming languages and communities can be highly beneficial. These managers can offer more relevant and effective support for managing code reuse in specific environments. Additionally, enhancing existing package managers with features such as reuse tracking, version control, and automated updates can improve development efficiency and reduce the associated risks of code reuse.\n\nCommunity Engagement. Engaging with open source communities to develop tools and practices that address the unique needs of different ecosystems, and collaborating with these communities, can ensure widespread adoption and effectiveness. Continuously gathering user feedback and iterating on the tools to enhance their functionality and usability is also important. This iterative process helps create robust and reliable tools that meet the evolving needs of software developers.\n\n9 CONCLUSIONS\n\nIn conclusion, our study highlights the non-negligible role of copy-based reuse in open source software development. By leveraging the extensive World of Code (WoC) dataset, we provided a comprehensive analysis of code reuse, revealing that a substantial portion of open source projects engage in this practice. Our findings indicate that 6.9% of all blobs in OSS have been reused at least once, and 80% of projects have reused blobs from another project. This widespread reuse emphasizes the efficiency gains in OSS development but also raises concerns about security and legal compliance.\n\nThe variation in reuse patterns across programming languages underscores the influence of language-specific ecosystems and practices. Moreover, the higher propensity for binary blob reuse suggests a need for tailored tools to support different types of reuse. Future research should focus on improving the accuracy and comprehensiveness of reuse detection and exploring the impact of partial file reuse.\n\nThe survey results further enrich our understanding of reuse practices. We found that many creators intended their resources for reuse, indicating a collaborative mindset among developers. Reusers generally found the reused blobs helpful. Despite these positive perceptions, reusers showed relatively low concern about potential bugs and changes in the original files. This low level of concern could suggest either a high level of trust in the quality of the reused code or a lack of awareness of the associated risks. Additionally, the survey revealed a moderate interest in using package managers to handle changes to reused files. This indicates potential demand for tools that can streamline and manage code reuse more effectively.\n\nOverall, our work provides insights into the patterns and factors affecting code reuse, advocating for better management and support tools to enhance the sustainability and security of OSS. By addressing the identified risks and leveraging the collaborative nature of the OSS community, we can improve code reuse practices and outcomes.\n\nACKNOWLEDGMENTS\n\nThis work was supported in part by the National Science Foundation under Award Numbers 1901102 and 2120429. The authors additionally thank Dr. James Herbsleb and Dr. Bogdan Vasilescu for their valuable advice and insightful comments, which helped improve this work. The authors also thank the reviewers for their constructive feedback and suggestions, which helped enhance the quality of this paper.\n\nREFERENCES\n\n[1] Qurat Ul Ain, Wasi Haider Butt, Muhammad Waseem Anwar, Farooque Azam, and Bilal Maqbool. 2019. A systematic review on code clone detection. IEEE access 7 (2019), 86121\u201386144.\n\n[2] Le An, Ons Mlouki, Foutse Khomh, and Giuliano Antoniol. 2017. Stack overflow: a code laundering platform?. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 283\u2013293.\n[3] Corey M Angst, Ritu Agarwal, Vallabh Sambamurthy, and Ken Kelley. 2010. Social contagion and information technology diffusion: The adoption of electronic medical records in US hospitals. *Management Science* 56, 8 (2010), 1219\u20131241.\n\n[4] Giuliano Antoniol, Massimiliano Di Penta, and Ettore Merlo. 2004. An automatic approach to identify class evolution discontinuities. In *Proceedings. 7th International Workshop on Principles of Software Evolution*, 2004. IEEE, 31\u201340.\n\n[5] Zubin Austin and Jane Sutton. 2014. Qualitative research: Getting started. *The Canadian journal of hospital pharmacy* 67, 6 (2014), 436.\n\n[6] Tegawend\u00e9 F Bissyand\u00e9, Ferdian Thung, David Lo, Lingxiao Jiang, and Laurent R\u00e9veillere. 2013. Popularity, interoperability, and impact of programming languages in 100,000 open source projects. In *2013 IEEE 37th annual computer software and applications conference*. IEEE, 303\u2013312.\n\n[7] Kelly Blincoe, Jyoti Sheoran, Sean Goggins, Eva Petakovic, and Daniela Damian. 2016. Understanding the popular users: Following, affiliation influence and leadership on GitHub. *Information and Software Technology* 70 (2016), 30\u201339.\n\n[8] Hudson Borges, Andre Hora, and Marco Tulio Valente. 2016. Predicting the popularity of github repositories. In *Proceedings of the The 12th international conference on predictive models and data analytics in software engineering*. 1\u201310.\n\n[9] Lina Boughton, Courtney Miller, Yasemin Acar, Dominik Wermke, and Christian K\u00e4stner. 2024. Decomposing and Measuring Trust in Open-Source Software Supply Chains. In *Proceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results*. 57\u201361.\n\n[10] Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. *Qualitative research in psychology* 3, 2 (2006), 77\u2013101.\n\n[11] Alan W Brown and Kurt C Wallnau. 1998. The current state of CBSE. *IEEE software* 15, 5 (1998), 37\u201346.\n\n[12] Andrea Capiluppi, Patricia Lago, and Maurizio Morisio. 2003. Characteristics of open source projects. In *Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings*. IEEE, 317\u2013327.\n\n[13] Ashley Castleberry and Amanda Nolen. 2018. Thematic analysis of qualitative research data: Is it as easy as it sounds? *Currents in pharmacy teaching and learning* 10, 6 (2018), 807\u2013815.\n\n[14] Nicholas A Christakis and James H Fowler. 2013. Social contagion theory: examining dynamic social networks and human behavior. *Statistics in Medicine* 32 (2013), 556\u2013577. Issue 4. https://doi.org/10.1002/sim.5408\n\n[15] Russ Cox. 2019. Surviving Software Dependencies: Software reuse is finally here but comes with risks. *Queue* 17, 2 (2019), 24\u201347.\n\n[16] John W Creswell and J David Creswell. 2017. *Research design: Qualitative, quantitative, and mixed methods approaches*. Sage publications.\n\n[17] Kevin Crowston and James Howison. 2005. The social structure of free and open source software development.\n\n[18] Norman K Denzin. 2017. *The research act: A theoretical introduction to sociological methods*. Routledge.\n\n[19] Massimiliano Di Penta, Daniel M German, Yann-Ga\u00ebl Gu\u00e9h\u00e9neuc, and Giuliano Antoniol. 2010. An exploratory study of the evolution of software licensing. In *2010 ACM/IEEE 32nd International Conference on Software Engineering*, Vol. 1. IEEE, 145\u2013154.\n\n[20] Muyue Feng, Weixuan Mao, Zimu Yuan, Yang Xiao, Gu Ban, Wei Wang, Shiyang Wang, Qian Tang, Jiahuan Xu, He Su, et al. 2019. Open-source license violations of binary software at large scale. In *2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER)*. IEEE, 564\u2013568.\n\n[21] Felix Fischer, Konstantin B\u00f6ttinger, Huang Xiao, Christian Stransky, Yasemin Acar, Michael Backes, and Sascha Fahl. 2017. Stack Overflow Considered Harmful? The Impact of Copy&Paste on Android Application Security. In *2017 IEEE Symposium on Security and Privacy (SP)*. 121\u2013136. https://doi.org/10.1109/SP.2017.31\n\n[22] Samuel W Flint, Jigyasa Chauhan, and Robert Dyer. 2021. Escaping the time pit: Pitfalls and guidelines for using time-based git data. In *2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)*. IEEE, 85\u201396.\n\n[23] William Frakes and Carol Terry. 1996. Software reuse: metrics and models. *ACM Computing Surveys (CSUR)* 28, 2 (1996), 415\u2013435.\n\n[24] William B Frakes and Christopher J Fox. 1995. Sixteen questions about software reuse. *Commun. ACM* 38, 6 (1995), 75\u2013ff.\n\n[25] William B Frakes and Kyo Kang. 2005. Software reuse research: Status and future. *IEEE transactions on Software Engineering* 31, 7 (2005), 529\u2013536.\n\n[26] William B Frakes and Giancarlo Succi. 2001. An industrial study of reuse, quality, and productivity. *Journal of Systems and Software* 57, 2 (2001), 99\u2013106.\n\n[27] Mark Gabel and Zhendong Su. 2010. A study of the uniqueness of source code. In *Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering*. 147\u2013156.\n\n[28] Jonas Gamalielsson and Bj\u00f6rn Lundell. 2014. Sustainability of Open Source software communities beyond a fork: How and why has the LibreOffice project evolved? *Journal of systems and Software* 89 (2014), 128\u2013145.\n\n[29] CJ Michael Geisterfer and Sudipto Ghosh. 2006. Software component specification: a study in perspective of component selection and reuse. In *Fifth International Conference on Commercial-off-the-Shelf (COTS)-Based Software Systems (ICCBSS\u201905)*. IEEE, 9\u2013pp.\n\n[30] Daniel M German. 2002. The evolution of the GNOME Project. In *Proceedings of the 2nd Workshop on Open Source Software Engineering*. 20\u201324.\n\n[31] Daniel M German, Massimiliano Di Penta, Yann-Gael Gueheneuc, and Giuliano Antoniol. 2009. Code siblings: Technical and legal implications of copying code between applications. In *2009 6th IEEE International Working Conference on Mining Software Repositories*. IEEE, 81\u201390.\n[32] Daniel M German and Ahmed E Hassan. 2009. License integration patterns: Addressing license mismatches in component-based development. In 2009 IEEE 31st international conference on software engineering. IEEE, 188\u2013198.\n\n[33] Mohammad Gharehyazie, Baishakhi Ray, and Vladimir Filkov. 2017. Some from here, some from there: Cross-project code reuse in github. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 291\u2013301.\n\n[34] Mohammad Gharehyazie, Baishakhi Ray, Mehdi Keshani, Masoumeh Soleimani Zavosht, Abbas Heydarnoori, and Vladimir Filkov. 2019. Cross-project code clones in GitHub. Empirical Software Engineering 24, 3 (2019), 1558\u20131573.\n\n[35] Antonios Gkortzis, Daniel Feitosa, and Diomidis Spinellis. 2021. Software reuse cuts both ways: An empirical analysis of its relationship with security vulnerabilities. Journal of Systems and Software 172 (2021), 110653.\n\n[36] Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In 2013 10th Working Conference on Mining Software Repositories (MSR). IEEE, 233\u2013236.\n\n[37] Georgios Gousios and Diomidis Spinellis. 2012. GHTorrent\u2019s data from a firehose. In 2012 9th IEEE Working Conference on Mining Software Repositories (MSR). IEEE, 12\u201321.\n\n[38] Greg Guest, Arwen Bunce, and Laura Johnson. 2006. How many interviews are enough? An experiment with data saturation and variability. Field methods 18, 1 (2006), 59\u201382.\n\n[39] Stefan Haefliger, Georg Von Krogh, and Sebastian Spaeth. 2008. Code reuse in open source software. Management science 54, 1 (2008), 180\u2013193.\n\n[40] Steve Hanna, Ling Huang, Edward Wu, Saung Li, Charles Chen, and Dawn Song. 2012. Juxtapp: A scalable system for detecting code reuse among android applications. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 62\u201381.\n\n[41] Hideaki Hata, Raula Gaikovina Kula, Takashi Ishio, and Christoph Treude. 2021. Research artifact: The potential of meta-maintenance on GitHub. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 192\u2013193.\n\n[42] Hideaki Hata, Raula Gaikovina Kula, Takashi Ishio, and Christoph Treude. 2021. Same file, different changes: the potential of meta-maintenance on github. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 773\u2013784.\n\n[43] Lars Heinemann, Florian Deissenboeck, Mario Gleirscher, Benjamin Hummel, and Maximilian Irlbeck. 2011. On the extent and nature of software reuse in open source java projects. In International Conference on Software Reuse. Springer, 207\u2013222.\n\n[44] David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant. 2013. Applied logistic regression. John Wiley & Sons.\n\n[45] Katsuro Inoue, Yuya Miyamoto, Daniel M German, and Takashi Ishio. 2021. Finding code-clone snippets in large source-code collection by CCgrep. In Open Source Systems: 17th IFIP WG 2.13 International Conference, OSS 2021, Virtual Event, May 12\u201313, 2021, Proceedings 17. Springer, 28\u201341.\n\n[46] Mahmoud Jahanshahi and Audris Mockus. 2024. Dataset: Copy-based Reuse in Open Source Software. In 2024 IEEE/ACM 21st International Conference on Mining Software Repositories (MSR). IEEE, 42\u201347.\n\n[47] Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stephane Glondu. 2007. Deckard: Scalable and accurate tree-based detection of code clones. In 29th International Conference on Software Engineering (ICSE\u201907). IEEE, 96\u2013105.\n\n[48] Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, and Stefan Wagner. 2009. Do code clones matter?. In 2009 IEEE 31st International Conference on Software Engineering. IEEE, 485\u2013495.\n\n[49] Cory J Kapser and Michael W Godfrey. 2008. \u201cCloning considered harmful\u201d considered harmful: patterns of cloning in software. Empirical Software Engineering 13 (2008), 645\u2013692.\n\n[50] Naohiro Kawamitsu, Takashi Ishio, Tetsuya Kanda, Raula Gaikovina Kula, Coen De Roover, and Katsuro Inoue. 2014. Identifying source code reuse across repositories using lcs-based source code similarity. In 2014 IEEE 14th international working conference on source code analysis and manipulation. IEEE, 305\u2013314.\n\n[51] Stefan Koch and Georg Schneider. 2002. Effort, co-operation and co-ordination in an open source software project: GNOME. Information Systems Journal 12, 1 (2002), 27\u201342.\n\n[52] Rainer Koschke. 2007. Survey of research on software clones.\n\n[53] Robert V Krejcie and Daryle W Morgan. 1970. Determining sample size for research activities. Educational and psychological measurement 30, 3 (1970), 607\u2013610.\n\n[54] Charles W Krueger. 2001. Easing the transition to software mass customization. In International Workshop on Software Product-Family Engineering. Springer, 282\u2013293.\n\n[55] Charles W Krueger. 1992. Software reuse. ACM Computing Surveys (CSUR) 24, 2 (1992), 131\u2013183.\n\n[56] Piergiorgio Ladisa, Henrik Plate, Matias Martinez, and Olivier Barais. 2023. Sok: Taxonomy of attacks on open-source software supply chains. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 1509\u20131526.\n\n[57] Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 631\u2013636.\n\n[58] Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. 2006. CP-Miner: Finding copy-paste and related bugs in large-scale software code. IEEE Transactions on software Engineering 32, 3 (2006), 176\u2013192.\n[59] Long Liang, Xiaobo Wu, Jing Deng, and Xin Lv. 2022. Research on Risk Analysis and Governance Measures of Open-source Components of Information System in Transportation Industry. *Procedia Computer Science* 208 (2022), 106\u2013110. https://doi.org/10.1016/j.procs.2022.10.017\n\n[60] Cristina V Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. D\u00e9j\u00e0Vu: a map of code duplicates on GitHub. *Proceedings of the ACM on Programming Languages* 1, OOPSLA (2017), 1\u201328.\n\n[61] Adolfo Lozano-Tello and Asunci\u00f3n G\u00f3mez-P\u00e9rez. 2002. BAREMO: how to choose the appropriate software component using the analytic hierarchy process. In *Proceedings of the 14th international conference on Software engineering and knowledge engineering*. 781\u2013788.\n\n[62] Yuxing Ma, Chris Bogart, Sadika Amreen, Russell Zaretzki, and Audris Mockus. 2019. World of code: an infrastructure for mining the universe of open source VCS data. In *2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)*. IEEE, 143\u2013154.\n\n[63] Yuxing Ma, Tapajit Dey, Chris Bogart, Sadika Amreen, Marat Valiev, Adam Tutko, David Kennard, Russell Zaretzki, and Audris Mockus. 2021. World of code: Enabling a research workflow for mining and analyzing the universe of open source vcs data. *Empirical Software Engineering* 26, 2 (2021), 1\u201342.\n\n[64] Yuxing Ma, Audris Mockus, Russel Zaretzki, Randy Bradley, and Bogdan Bichescu. 2020. A methodology for analyzing uptake of software technologies among developers. *IEEE Transactions on Software Engineering* 48, 2 (2020), 485\u2013501.\n\n[65] Mark Mason et al. 2010. Sample size and saturation in PhD studies using qualitative interviews.\n\n[66] Hafedh Mili, Fatma Mili, and Ali Mili. 1995. Reusing software: Issues and research directions. *IEEE transactions on Software Engineering* 21, 6 (1995), 528\u2013562.\n\n[67] Michael Mitzenmacher and Eli Upfal. 2017. *Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis*. Cambridge university press.\n\n[68] Audris Mockus. 2007. Large-scale code reuse in open source software. In *First International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS\u201907: ICSE Workshops 2007)*. IEEE, 7\u20137.\n\n[69] Audris Mockus. 2019. Insights from open source software supply chains (keynote). In *Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering* (Tallinn, Estonia) (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA, 3. https://doi.org/10.1145/3338906.3342813\n\n[70] Audris Mockus. 2022. Tutorial: Open Source Software Supply Chains. https://mockus.org/papers/SSCISEC22.pdf\n\n[71] Audris Mockus. 2023. Securing Large Language Model Software Supply Chains. https://mockus.org/papers/wocllm.pdf ASE\u201923 LLMs in Software Engineering.\n\n[72] Audris Mockus, Diomidis Spinellis, Zoe Kotti, and Gabriel John Dusing. 2020. A complete set of related git repositories identified via community detection approaches based on shared commits. In *Proceedings of the 17th International Conference on Mining Software Repositories*. 513\u2013517.\n\n[73] Chinenye Okafor, Taylor R Schorlemmer, Santiago Torres-Arias, and James C Davis. 2022. Sok: Analysis of software supply chain security by establishing secure design properties. In *Proceedings of the 2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses*. 15\u201324.\n\n[74] Joel Ossher, Sushil Bajracharya, and Cristina Lopes. 2010. Automated dependency resolution for open source software. In *2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)*. IEEE, 130\u2013140.\n\n[75] David Lorge Parnas. 1972. On the criteria to be used in decomposing systems into modules. *Commun. ACM* 15, 12 (1972), 1053\u20131058.\n\n[76] Shi Qiu, Daniel M German, and Katsuro Inoue. 2021. Empirical study on dependency-related license violation in the javascript package ecosystem. *Journal of Information Processing* 29 (2021), 296\u2013304.\n\n[77] Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in github. In *Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering*. 155\u2013165.\n\n[78] David Reid, Mahmoud Jahanshahi, and Audris Mockus. 2022. The extent of orphan vulnerabilities from code reuse in open source software. In *Proceedings of the 44th International Conference on Software Engineering*. 2104\u20132115.\n\n[79] Jeffrey A. Roberts, Il-Horn Hann, and Sandra A. Slaughter. 2006. Understanding the motivations, participation, and performance of open source software developers: A longitudinal study of the apache projects. *Management Science* 52, 7 (July 2006), 984\u2013999.\n\n[80] Chanchal Kumar Roy and James R Cordy. 2007. A survey on software clone detection research. *Queen\u2019s School of Computing TR* 541, 115 (2007), 64\u201368.\n\n[81] Chanchal K Roy, James R Cordy, and Rainer Koschke. 2009. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. *Science of computer programming* 74, 7 (2009), 470\u2013495.\n\n[82] Julia Rubin and Marsha Chechik. 2013. A survey of feature location techniques. , 29\u201358 pages.\n\n[83] Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K Roy, and Cristina V Lopes. 2016. Sourcerercc: Scaling code clone detection to big-code. In *Proceedings of the 38th international conference on software engineering*. 1157\u20131168.\n\n[84] Mohammadreza Samadi, Alexander Nikolaev, and Rakesh Nagi. 2016. A subjective evidence model for influence maximization in social networks. *Omega* 59 (2016), 263\u2013278.\n[85] Susan Elliott Sim, Charles LA Clarke, and Richard C Holt. 1998. Archetypal source code searches: A survey of software developers and maintainers. In Proceedings. 6th International Workshop on Program Comprehension. IWPC\u201998 (Cat. No. 98TB100242). IEEE, 180\u2013187.\n\n[86] Manuel Sojer and Joachim Henkel. 2010. Code reuse in open source software development: Quantitative evidence, drivers, and impediments. Journal of the Association for Information Systems 11, 12 (2010), 2.\n\n[87] Chintakindi Srinivas, Vangipuram Radhakrishna, and CV Guru Rao. 2014. Clustering and classification of software component for efficient component retrieval and building component reuse libraries. Procedia Computer Science 31 (2014), 1044\u20131050.\n\n[88] Student. 1908. The probable error of a mean. , 25 pages.\n\n[89] Jane Sutton and Zubin Austin. 2015. Qualitative research: Data collection, analysis, and management. The Canadian journal of hospital pharmacy 68, 3 (2015), 226.\n\n[90] Jeffrey Svajlenko, Iman Keivanloo, and Chanchal K Roy. 2013. Scaling classical clone detection tools for ultra-large datasets: An exploratory study. In 2013 7th International Workshop on Software Clones (IWSC). IEEE, 16\u201322.\n\n[91] Jeffrey Svajlenko and Chanchal K Roy. 2014. Evaluating modern clone detection tools. In 2014 IEEE international conference on software maintenance and evolution. IEEE, 321\u2013330.\n\n[92] Jeffrey Svajlenko and Chanchal K Roy. 2015. Evaluating clone detection tools with bigclonebench. In 2015 IEEE international conference on software maintenance and evolution (ICSME). IEEE, 131\u2013140.\n\n[93] Jason Tsay, Laura Dabbish, and James Herbsleb. 2014. Influence of social and technical factors for evaluating contribution in GitHub. In Proceedings of the 36th international conference on Software engineering. 356\u2013366.\n\n[94] Bogdan Vasilescu, Alexander Serebrenik, and Vladimir Filkov. 2015. A data set for social diversity studies of github teams. In 2015 IEEE/ACM 12th working conference on mining software repositories. IEEE, 514\u2013517.\n\n[95] David M Weiss and Chi Tau Robert Lai. 1999. Software product-line engineering: a family-based software development process. Addison-Wesley Longman Publishing Co., Inc.\n\n[96] Katrin Weller and Katharina E Kinder-Kurlanda. 2016. A manifesto for data sharing in social media research. In Proceedings of the 8th ACM Conference on Web Science. 166\u2013172.\n\n[97] Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 87\u201398.\n\n[98] Dapeng Yan, Yuqing Niu, Kui Liu, Zhe Liu, Zhiming Liu, and Tegawend\u00e9 F Bissyand\u00e9. 2021. Estimating the attack surface from residual vulnerabilities in open source software supply chain. In 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS). IEEE, 493\u2013502.\n\n[99] Robert K Yin. 2015. Qualitative research from start to finish. Guilford publications.\n\n[100] Yuhang Zhao, Ruigang Liang, Xiang Chen, and Jing Zou. 2021. Evaluation indicators for open-source software: a review. Cybersecurity 4 (2021), 1\u201324.", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/011-jahanshahi.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 47, "total-input-tokens": 119572, "total-output-tokens": 42380, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 4123, 1], [4123, 8831, 2], [8831, 12834, 3], [12834, 16766, 4], [16766, 21035, 5], [21035, 25039, 6], [25039, 29395, 7], [29395, 33840, 8], [33840, 37993, 9], [37993, 42303, 10], [42303, 46954, 11], [46954, 47001, 12], [47001, 51675, 13], [51675, 56617, 14], [56617, 61263, 15], [61263, 65407, 16], [65407, 69375, 17], [69375, 73238, 18], [73238, 76663, 19], [76663, 80236, 20], [80236, 82229, 21], [82229, 85683, 22], [85683, 87090, 23], [87090, 91139, 24], [91139, 92673, 25], [92673, 96459, 26], [96459, 100117, 27], [100117, 102518, 28], [102518, 104410, 29], [104410, 107905, 30], [107905, 112550, 31], [112550, 115191, 32], [115191, 119210, 33], [119210, 123093, 34], [123093, 126222, 35], [126222, 129456, 36], [129456, 133322, 37], [133322, 137037, 38], [137037, 141175, 39], [141175, 145240, 40], [145240, 148842, 41], [148842, 153039, 42], [153039, 156901, 43], [156901, 162793, 44], [162793, 168245, 45], [168245, 173948, 46], [173948, 177111, 47]]}}
|
|
{"id": "11535bc79eafda826542462fb85901d80b961212", "text": "License Update and Migration Processes in Open Source Software Projects\n\nChris Jensen \nInstitute for Software Research \nUniversity of California, Irvine \nIrvine, CA USA 92697-3455 \nPhone: +1 (949) 824-0573 \nEmail: cjensen@ics.uci.edu\n\nWalt Scacchi \nInstitute for Software Research \nUniversity of California, Irvine \nIrvine, CA USA 92697-3455 \nPhone: +1 (949) 824-4130 \nEmail: wscacchi@ics.uci.edu\n\nAbstract\n\nOpen source software (OSS) has increasingly been the subject of research efforts. Central to this focus is the nature under which the software can be distributed, used, and modified and the causes and consequent effects on software development, usage, and distribution. At present, we have little understanding of, what happens when these licenses change, what motivates such changes, and how new licenses are created, updated, and deployed. Similarly, little attention has been paid to the agreements under which contributions are made to OSS projects and the impacts of changes to these agreements. We might also ask these same questions regarding the licenses governing how individuals and groups contribute to OSS projects. This paper focuses on addressing these questions with case studies of processes by which the Apache Software Foundation's creation and migration to Version 2.0 of the Apache Software License and the NetBeans project's migration to the Joint Licensing Agreement.\n\nKeywords\n\nOpen source, license evolution, process, Apache, NetBeans\n\nIntroduction\n\nSoftware process research has investigated many aspects of open source software (OSS) development in the last several years, including release processes,\ncommunication and collaboration, community joining, and project governance. The central point of Lawrence Lessig's book \u201cCode\u201d is that the hardware and software that make up cyberspace also regulate cyberspace. He argues that code both enables and protects certain freedoms, but also serves as to control cyberspace. Software licenses codify these freedoms and regulations by setting forth the terms and conditions for software use, modification, and distribution of a system and any changes made to it. For that reason, others have suggested that licenses serve as contracts for collaboration. In the case of non-OSS licenses, that contract may indicate no collaboration, but rather strict separation between users and developers. OSS licenses, by contrast range widely in permissiveness, some granting more rights to the original authors and some granting more rights to consumers of OSS software. While research has examined OSS licenses to great detail, we are only beginning to understand license evolution. Just as OSS code is not static, neither are the licenses under which it is distributed. Research into license evolution is just beginning. However, when licenses change, so too the contracts for collaboration change. This paper seeks to provide an incremental step to understanding how changes in software licensing impact software development processes.\n\nWhy does understanding license update and migration matter? Companies using OSS software need to know how changes affect their use, modification, and distribution of a software system. License compatibility in OSS has long been a topic of debate. Research is only beginning to provide tools for assistance in resolving software license compatibility [1]. OSS project participants need to understand why changes are being made, whether the changes align with their values and business models (e.g., enabling new avenues of license compatibility offering strategic benefit or opening up new channels of competition). As a project sponsor or host, you may be concerned about how to best protect both the software system and your user community, but also your business model. You typically want a license that will attract a large number of developers to your project [2] while at the same time allowing you to make a profit and stay in business.\n\nWhile licenses such as the GNU General Public License (GPL), the Berkeley Software Distribution (BSD) license, and the Apache License are well known, we rarely consider another type of license agreement critical to understanding collaboration in OSS projects: individual contributor license agreements (CLAs) and organizational contributor license agreements (OCLAs), for contributors from organized entities. In non-OSS software development, the contract for collaboration is typically an employment contract, often stating that all intellectual property rights pertaining to source code written by an employee are property of the employer. This provides the employer with complete control of the rights granted of licensed software. In OSS development, you have a situation where multiple developers are contributing to a software system. Without copyright assignment or a CLAs, changing a software license requires the consent of every contributor to that system. We observed this situation in the case of the Linux kernel, which suggested that\nwithout a CLA, license evolution can become inhibited or prevented as the number of contributors, each with differing values and objectives, increases. To understand how changes in software licenses affect software development processes, we must also investigate changes in CLAs.\n\nWe address these issues with two case studies. The first examines the creation and deployment of the Apache Software License, Version 2.0. The second looks at an update to the contributor license agreement in the NetBeans project.\n\n**Background Work**\n\nLegal scholars, such as St. Laurent [3] and Larry Rosen [4], former general counsel and secretary for the Open Source Initiative (OSI), have written extensively on license selection. They note that quite often, the choice of license is somewhat outside the control of a particular developer. This is certainly the case for code that is inherited or dependent on code that is either reciprocally licensed, or at the very least, requires a certain license for the sake of compatibility. However, outside such cases, both St. Laurent and Rosen advocate for the use of existing and well-tested, well-understood licenses as opposed to the practice of creating new licenses. Such license proliferation is seen as a source of confusion among users and is often unnecessary given the extensive set of licenses that already exist for a diverse set of purposes. Lerner and Tirole [5] observe specific determinant factors in license selection. Of the 40,000 Sourceforge projects studied, projects geared towards end-users tended towards more restrictive license terms, while projects directed towards software developers tended towards less restrictive licenses. Highly restrictive licenses were also found more common in consumer software (e.g., games) but less common for software on consumer-oriented platforms (e.g., Microsoft Windows) as compared to non-consumer-oriented platforms. Meanwhile, Rosen specifically addresses the issue of relicensing, commenting that license changes made by fiat are likely to fracture the community. This case of relicensing is exactly the focus of our case studies here.\n\nThe drafting and release of the GNU General Public License, Version 3.0 was done in a public fashion, inviting many prominent members of the OSS community to participate in the process. In fact, we even see a sort of prescriptive process specification outlining, at a high level, how the new license was to be created. This license revision process is interesting from the perspective that the license in question is not used by one project or one foundation, but rather is an update of the most commonly used open source license in practice. As such the process of its update and impact of its revision on software development is both wide ranging and widely discussed.\n\nDi Penta, et al. [6], examined changes to license headers in source code files in several major open source projects. Their three primary research questions sought to\nunderstand how frequently licensing statements in source code files change, the extent of the changes, and how copyright years change in source code files. Their work shows that most of the changes observed to source code files are small, though even small changes could signify a migration to a different license. The authors also note that little research available speaks to license evolution, pointing to the need for greater understanding in this area.\n\nLindman, et al., [2] examine how companies perceive open source licenses and what major factors contribute to license choice in companies releasing open source software. The study reveals a tight connection between business model, patent potential, the motivation for community members to participate in development, control of project direction, company size, and network externalities (compatibility with other systems) and licensing choice.\n\nLindman, et al., provide a model of a software company, its developers, and users in the context of an OSS system developed and released from a corporate environment [2]. However, few systems are developed in complete isolation.\n\nFigure 1. A model of software production and consumption with open source licensing\nRather, they leverage existing libraries, components, and other systems developed by third parties. Moreover, as Goldman and Gabriel point out, open source is more than just source code in a public place released under an OSS license [7]; communities matter. Figure 1 shows the production and consumption of open source software, highlighting the impact of software licenses and contributor license agreements.\n\nGoing a step further, Oreizy [8] describes a canonical high-level software customization process for systems and components, highlighting intra-organizational software development processes and resource flow between a system application developer, an add-on developer, a system integrator, and an end user.\n\nSimilarly, we have examined such concepts in the context of software ecosystems [9] in the context of process interaction. Software license change can precipitate integrative forms of process interaction in the case of dual and multi-licensing by enabling new opportunities for use of software systems upstream of a project to provide added functionality or support, as well as projects downstream vis a vis use as a library, plugin development, support tool development, and via customization and extension. In such cases, software source becomes a resource flowing between interacting projects. However, license change can also trigger interproject process conflict if new license terms render two systems incompatible. At that point, the resource flow between projects can be cut off, when downstream consumers of software source code no longer receive updates. A more common example with non-OSS is license expiration. License-based interproject process conflicts can also manifest as unmet dependencies in software builds or an inability to fix defects or add enhancements to software, resulting in process breakdown, and failing recover, project failure. OSS licenses, however, guarantee that even when conflict occurs, recovery is possible because the source is available and can be forked.\n\nMethodology\n\nThe case studies in this report are part of an ongoing, multi-year research project discovering and modeling open source software processes. Our research methodology is ethnographically informed, applying a grounded theory to the analysis of artifacts found in OSS projects. The primary data sources in this study come from mailing list archives of the Apache and NetBeans projects.\n\nOur primary data sources were mailing list messages. However, we also found supplementary documentation on each project's websites that served to inform our study. These supplementary documents were often, though not always referenced by the messages in the mailing list. Cases regarding the NetBeans project all took place between April and June of 2003, involving over 300 email messages, whereas the Apache cases were spread over several discrete time periods and consisted of more than 350 messages.\nCase selection happened in two ways. For NetBeans, the cases arose during our study of requirements and release processes, having stood out as prominent issues facing the community during the time period studied. Although we observed additional incidents appropriate for discussion, the three cases selected fit together nicely as a cohesive story. This approach was also used in the study of the Apache project. However, due to a lower incident frequency, we expanded our study over a longer time period to find incidents that proved substantial. As a testament to the nature of project interaction, issues raised in mailing list discussions proved to be short-lived, either because they were resolved quickly or because the conversation simply ceased. It is possible to suggest this is the normal behavior pattern for both projects. A few issues proved outliers, having more focused discussions, and these were selected for further study. We also observed a tendency for discussions to play out in a series of short-lived discussions sessions. A topic would be raised, receiving little or no attention. Then, at a later time, it would be raised again. The JCA discussion in NetBeans and Subversion migration discussion in the Apache project demonstrated such conversational resurgence. We observed, in general, that discussion topics carry certain conversational momentum. Topics with a high degree of momentum tended to have lengthier discussion periods or frequent discussion sessions until fully resolved or abandoned while topics with a low degree of momentum were addressed quickly or simply died off. The causes and factors affecting changes in momentum were not investigated as they lay too far afield from the focus of this study. We do note that although consensus by attrition has been cited in other communities (e.g., [10 and 11]), we did not observe it in effect in any of the cases studied, but rather that the primary participants in discussions remained active in their respective projects for several months following the reported incidents. The creation of the Apache License, version 2.0 was directed to us by a colleague familiar with the project. Data for the Apache licensing case was gathered from email messages sent to a mailing list established for the purpose of discussing the proposed changes.\n\nConsidering the difficulties we experienced with building our own search engine to support process discovery, we still faced the challenge of keeping track of process data once we found it as we were building our models. Up until this point, our approach to providing process traceability was simply to include links to project artifacts in our models. However, this strategy did not help us build the models, themselves. We returned the search problem back to the projects, themselves using their own search engines to locate process data, looking for more lightweight support for discovery.\n\nOur current strategy for providing computer support for process discovery returns to using each project's own search engine to locate process information. We have operationalized the reference model as an OWL ontology with the Prot\u00e9g\u00e9 ontology editor [12], using only the OWL class and individual constructs to store process concepts and their associated search queries respectively. Secondly, we built a Firefox plugin, Ontology [13], to display the reference model ontology in the Firefox web browser. Next, we enlisted the Zotero citation database Firefox plugin\nto store process evidence elicited from project data, integrating the two plugins such that each datum added to the citation database from a project artifact is automatically tagged with the selected reference model entities.\n\nThe use of a citation database as a research data repository may seem unintuitive. Zotero, however, has proven well suited for our needs. Like many Firefox plugins, Zotero can create records simply from highlighted sections of a web document, though the creation of arbitrary entries (not gleaned from document text selections) is also possible. It can also save a snapshot of the entire document for later review, which is useful given the high frequency of changes of some web documents\u2014changes that evidence steps in a software processes. The tag, note, and date fields for each entry are useful for recording reference model associations and memos about the entry for use in constructing process steps and ascertaining their order. A screenshot of Zotero with Ontology appears in Figure 2.\n\nThe plugin integration greatly facilitates the coding of process evidence and provides traceability from raw research data to analyzed process models. As the tool set is browser-based, it is not limited to analysis of a particular data set, whether local or remote. Moreover, the tool set does not limit users to a single ontology or Zotero database, thereby allowing users to construct research models using multiple ontologies describing other (e.g. non-OSS process) phenomenon and reuse the tool set for analysis of additional data sets. Thus, it may be easily appropriated for grounded theory research in other fields of study.\nThe elicitation of process evidence is still search driven. Rather than use one highly customized search engine for all examined data repositories, the search task has been shifted back to the organizations of study. This decision has several implications in comparison with the previous approach, both positive and negative. Using an organization's own search engine limits our ability to extract document-type specific metadata, however among the organizations we have studied, their search tools provide greater coverage of document and artifact types than Lucene handled at that time. Furthermore, this approach does not suffer the data set limitations imposed by web crawler exclusion rules. The ability to query the data set in a scripted fashion has been lost, yet some scientists would see this as a gain. The use of computer-assisted qualitative data analysis software (CAQDAS) historically has put into question the validity of both the research method and results [15,16].\n\nThis tool was still quite unfinished as we began governance process discovery and modeling. As we added functionality, we had to return to some of our data sources and recapture it. Although we have high hope to use the integrated timeline feature to assist in process activity composition and sequencing, the time and date support within Zotero's native date format was insufficiently precise. With provisions only for year, month, and day, there is no ability to capture action sequences that happen on the same day. After adding support for greater date and time, we found having to enter the date and time for every piece of data we captured rather tedious. Eventually we have had to prioritize completion of discovery and modeling ahead of computer-support for process discovery, and we had to disable the time and date entry. Unable to utilize Zotero to our intended effect in discovery and modeling, our efforts with Zotero remain in progress, pending usability improvements.\nCreation and Migration to the Apache License, Version 2.0\n\nThe Apache Software Foundation created a new version of their license in the end of 2003 and beginning of 2004. Roy Fielding, then director of the ASF, announced the license proposal on 8 November 2003 [17], inviting review and discussion on a mailing list set up specifically for said purpose. Per Roy's message, the motivations for the proposed license included\n\n- Reducing the number of frequently asked questions about the Apache License.\n- Allowing the license to be usable by any (including non-Apache) projects\n- Requiring a patent license on contributions that necessarily infringe the contributor's own patents\n- Moving the full text of the license and specific conditions outside the source code\n\nRoy further indicated a desire to have a license compatible with other OSS licenses, notably the GPL.\nAs you can see from Figure 3, most of the discussion took place in mid November of 2003. In fact, given that the ApacheCon conference that ran from 16-19 November, we can see a high message density in the days leading up to ApacheCon, with a steady rate continuing on for a few days afterward. Beyond this, the frequency becomes sparse. An update to the proposed license was announced on 24 December 2003, after some internal review, a part of the process that is not publicly visible. This update prompted a brief discussion. A second active time period is observable in January 2004, when Fielding announces a final update (20 January 2004) and that the final version of the license has been approved by the board [18 and 19] (21 January 2004).\n\nThe primary discussion point of the creation and migration to the 2.0 version of the Apache License centered around a patent clause in the proposed license. According to Brian Behlendorf, who was serving on the ASF board of directors at the time, the ASF\u2019s patent-related goals were to \u201cprevent a company from sneaking code into the codebase covered by their own patent and then seeking royalties from either the ASF or end-users\u201d [20]. The clause in question read:\n\n5. Reciprocity. If You institute patent litigation against a Contributor with respect to a patent applicable to software (including a cross-claim or counterclaim in a lawsuit), then any patent licenses granted by that Contributor to You under this License shall terminate as of the date such litigation is filed. In addition, if You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work itself (excluding combinations of the Work with other software or hardware) infringes Your patent(s), then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. [21]\n\nConsequences of this clause sparked discussion in a few areas, mainly surrounding the first sentence of the clause regarding license termination. Legal representatives from industry stated objections to losing usage rights for patent litigation regarding any software, even software unrelated to that covered by the license [22], proposing alternative wordings to achieve the stated license goals but restricting the trigger to litigation pertaining to patents covered by the ASF licensed code [23]. Uncertainty regarding the roles of people in the license revision process [24] and proposed changes [25] created additional confusion regarding the patent reciprocity stance.\n\nEben Moglen, General Counsel for the Free Software Foundation (FSF), adds that the first sentence of the license clause carries great risk for unintended and serious consequences, and is an inappropriate vehicle for protecting free software against patent litigation [26]. As such, the FSF has deemed the clause causes the license to be incompatible with version 2 of the GPL, failing one of the goals of the proposed Apache License.\nBrian Carlson reports that the Debian community's consensus is that the proposed license does not meet the criteria for *Free Software Licenses* under the Debian Free Software Guidelines [27]. Consequently, code licensed as such would sandboxed into the non-free archive, and therefore, not automatically built for Debian distributions, nor receive quality assurance attention. Again, the license termination aspect of the reciprocity clause is cited as the critical sticking point [28], with several members of the Debian community arguing that free software licenses should only restrict modification and distribution, but not usage of free software.\n\nThe patent reciprocity clause was not entirely rejected. There was support for extending it to provide mutual defense against patent litigation attacks against all open source software [29]. The idea was quickly nixed on the grounds that it could lead to users being attacked and unable to defend themselves if someone were to maliciously violate a user's patent on an unrelated piece of software and create an open source version. In such a scenario, the user would have to choose between using Apache licensed software and losing all their patents [30].\n\nOn 18 November, Fielding indicates that there have been \u201cseveral iterations on the patent sentences, mostly to deal with derivative work\u201d [24], mentioning he will probably include the suggested changes in the patent language recommended by one of the legal representatives from industry. Fielding notes that he has been in contact with representatives from other organizations, among them Apple, Sun, the OSI, Mozilla, and a few independent attorneys, although the details of these portions of the process remain hidden.\n\nThe next milestone in the process occurs on 24 December, when Fielding mentions that a second draft, version 1.23, has been prepared after internal review due to extensive changes [31], and has been posted to the proposed licenses website [32] and the mailing list. The new proposed license [33] incorporates many of the proposed changes, including the removal of the contested first sentence of the patent reciprocity clause, leaving the generally agreed upon patent termination condition:\n\n> If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.\n\nThe 1.23 version of the license received little feedback on the license discussion mailing list. Aside from definition clarifications, there was an inquiry about GPL compatibility. Behlendorf commented that Moglen's suggestions had been incorporated to address the two issues with GPL compliance, but he had been contacted earlier in the week to take a look at the current draft [34]. As a result, Behlendorf (on 7 January 2004) offers that the issues presented have been addressed to his satisfaction and is willing to propose the license to the board at the January 2004 meeting [35]. However, before the board meeting, Fielding announces a version 1.24, featuring a change to the definition of \u201cContributor\u201d [36] and a 1.25\nversion very shortly thereafter to address the way \u201cCopyright\u201d is represented due to various laws and the use of \u201c(C)\u201d to indicate copyright [37]. Finally, the Apache License, Version 2.0 was approved by the ASF board by a unanimous vote on 20 January 2004 [18] and announced to the mailing list by Fielding the following day [19]. Per the board meeting minutes:\n\nWHEREAS, the foundation membership has expressed a strong desire for an update to the license under which Apache software is released,\n\nWHEREAS, proposed text for the new license has been reworked and refined for many, many months, based on feedback from the membership and other parties outside the ASF,\n\nNOW, THEREFORE, BE IT RESOLVED, that the proposed license found at http://www.apache.org/licenses/proposed/LICENSE-2.0.txt is officially named the Apache Software License 2.0. To grant a sufficient transition time, this license is to be used for all software releases from the Foundation after the date of March 1st, 2004.\n\nThe conversation continued on, briefly, to address two points. Firstly, a return to the GPL compatibility discussion. Don Armstrong requested verification as to whether Moglen/the FSF has identified the license as GPL compatible (Fielding's announcement claimed it was) [38]. Fielding responds, saying Moglen sent a private communication commenting on the license compatibility, and furthermore, that it was the belief of the ASF that \u201ca derivative work consisting of both Apache Licensed code and GPL code can be distributed under the GPL,\u201d and, as such, there wasn't anything further to consider, as far as the ASF was concerned [39]. Incidentally, the FSF standing is that due to the patent issue, the Apache license 2.0 is GPL3 compatible but not GPL2 compatible [40]. Secondly, Vincent Massol requested information about moving his Apache sub-project to the ASL2 license and what file license headers should be used [41], to which Behlendorf responds [42]. A flow graph of the License creation and migration process appears in Figure 4.\nIntroduction of the Joint License Agreement\n\nRosen [4] suggests that copyright assignment is sought for two purposes:\n\n1. So the project can defend itself in court without the participation and approval of its contributors.\n\n2. To give the project (and not the contributor) the right to make licensing decisions, such as relicensing, about the software.\n\nThe NetBeans case is interesting because it is not simple copyright assignment, but rather affords both the contributor and the project (Sun Microsystems, specifically) equal and independent copyright to contributed source.\n\nThe Joint License Agreement (JLA) was introduced to the NetBeans project on 28 April 2003 by Evan Adams, a prominent project participant working for Sun Microsystems [43]. Adams states that the JLA was being introduced in response to\nobservations by Sun's legal team of Mozilla and other open source projects and believed that Sun required full copyright authority to protect the NetBeans project from legal threats and provide Sun with the flexibility to adapt the NetBeans license over time. Under the proposed agreement, contributors (original authors) would retain all copyrights independently for project contributions and any previous contributions whose authors did not agree to the terms of the JCA would have to be removed from the source tree. The discussion spanned ninety messages from seventeen individuals over nearly two months, with a follow-up discussion consisting of forty six messages from fourteen individuals (eleven of whom participated in the earlier discussion) over a third month. The discussion, which began at the end of April 2003 continued through July (with a few sporadic messages extending out to September), long after the deadline for requiring JLA for project contributions.\nThe process for the license format change seems simple. The particulars of the proposed license received early focus in the discussion. As the discussion progressed, concern shifted away from details of the license agreement to the way in which the change was proposed. In the course of discussion, it was revealed that switching to the JLA was an idea proposed by the Sun legal counsel and the decision to adopt it was done internally, unilaterally, and irrevocably by Sun without the involvement of the project, at large. The adoption decision raised questions regarding the decision rights and transparency within the project.\n\nWhile recognizing that Sun-employed contributors were responsible for a majority of project effort, non-Sun contributors took the lack of transparency and consideration in the decision making process as disenfranchisement. In a follow-up discussion, project members further expressed fears that giving Sun full copyright of contributed code could lead to reclassification of volunteer-contributed code in\nobjectionable ways. More significantly, they feared the change could impact copyright of projects built upon the NetBeans codebase, but not contributed back to NetBeans source repository.\n\nIn time, most of the \u201ccorner case\u201d concerns about the license agreement were addressed. However, ultimately non-Sun employed contributors were still in the position of having to trust Sun to act in an acceptable manner with a grant of full copyright. Moreover, the discussion drew out larger concerns regarding Sun's role position of leadership and control of the project, and regarding transparency in decision making. A flow graph of the JCA introduction process appears in Figure 5.\n\nDiscussion and Conclusions\n\nThe two cases presented are not directly comparable. The Apache study looks at the process of creating a new license, to be used by all projects under the domain of the Apache Software Foundation. The NetBeans study focuses on the adoption of a new license agreement for contributors to the NetBeans IDE and platform. Software source licenses govern the rights and responsibilities of software consumers to (among other things) use, modify, and distribute software. Contributor license agreements (CLAs), on the other hand, govern the rights and responsibilities to (among other things) use, modify, and distribute contributions of the organization to which the contributions are submitted, and those retained by the contributor. The new CLA stated that copyright of project contributions would be jointly owned by the originating contributors, as well as the project's benefactor, Sun Microsystems. Code contribution agreements may not be of interest to end users of software executables. However, the OSS movement is known for its tendency towards user-contributors; that is, users who contribute to the development of the software and developers who use their own software.\n\nIf we consider, specifically, the license changes in the Apache and NetBeans projects, both were introduced as inevitable changes by persons of authority in each project (founder Roy Fielding of Apache and Evan Adams of Sun Microsystems for NetBeans). The initiators of the discussion both presented the rationale for making the changes. For Apache, the move was motivated by a desire to increase compatibility with other licenses, reduce the number of questions about the Apache license, moving the text outside the source code, and require patent license on contributions where necessary. For NetBeans, the motivations were to protect the project from legal threats and provide Sun the ability to change the license in the future. In the Apache case, the motivations for making the changes went unquestioned. The discussion focused on what objectives to achieve with the change and how best to achieve them. The former had to do with a (minority) subset of participants who saw the license change as an opportunity to affect software development culture, altering the direction of the software ecosystem as a means of governance on a macro level. The latter had to do with making sure the verbiage of\nthe license achieved the intended objectives of the license without unintended consequences (such as those whose nature was of the former). In the NetBeans case, the discussion focused on the differences between the licenses and their affect on non-sponsoring-organization participants (meso-level project governance) of the license. Given the context of the surrounded cases, the structural and procedural governance of the project was also questioned.\n\nThe area of the NetBeans license change that received the greatest push-back was granting the sponsoring organization the right to change the license unilaterally at any point in the future. This right was similarly granted to the ASF in the Apache contributor license agreement (CLA) [44], a point that was not lost on participants in the NetBeans license change discussions [45]. Why did this issue receive push-back in NetBeans and not Apache? West and O'Mahony [46] suggest caution that, unlike community-initiated projects, sponsored OSS projects must achieve a balance between establishing pre-emptive governance design (as we saw here) and establishing boundaries between commercial and community ownership and control. The surrounding cases served to create an atmosphere of distrust within the project. The distrust led to fears that contributions from the sponsoring organization would become closed off from the community, perhaps saved for the organization's commercial version of the product, leaving the sponsoring organization as free-riders [47 and 48] profiting off of the efforts of others without giving back [49] or otherwise limit what project participants can do with project code.\n\nPerhaps the most striking difference in the way the two license changes were introduced is that the Apache case invited project participants (as well as the software ecosystem and the public, at large) to be a part of the change, whereas the NetBeans case did not. Participants in the NetBeans project were left without a sense of transparency in the decision-making process in that the change was put on them without any warning before the decision was made. Moreover, they were left without representation in the decision-making process in that they did not participate in determining the outcome of a decision that had a large impact on them. This is not to say that the Apache case was entirely transparent. There are clear indications from the messages on the list that conversations were held off-list. Likewise, there were misconceptions over what roles participants played and participant affiliation. However, the process was not questioned, nor the result.\n\nIn conclusion, we have taken a first step to understanding how license change processes impact software development processes by discovering and modeling the update process for the Apache License and the update to the contributor license agreement in the NetBeans project. We observed how differences in the processes in introducing change intent influenced response to the changes. To put these cases into context, NetBeans underwent two license changes since the events described above, neither of which received significant push-back from the community. The first shifted the license to the CDDL. The second was a move to dual license NetBeans under the GPLv2. This second licensing shift was considered by Sun \u201cat the request from the community\u201d [50]. Unlike the introduction of the JCA, the GPL\nshift was presented to the community by Sun for feedback (in August 2007) as an added option (rather than a complete relicensing) before the change was made. Thus, we can clearly see further change in the processes used to govern the community in a way that directly addressed the defects in the project's governance processes circa 2003. Shah [51] echoes these concerns, observing that code ownership by firms creates the possibility that non-firm-employed contributors will be denied future access to project code. In other projects, these threats can lead to forking of the source, as happened when the MySQL corporation was purchased by Sun Microsystems, which, in turn, has recently been acquired by Oracle.\n\nAcknowledgements\n\nThe research described in this report is supported by grants from the Center for Edge Power at the Naval Postgraduate School, and the National Science Foundation, #0534771 and #0808783. No endorsement implied.\n\nReferences\n\n[1] Scacchi, W.; Alspaugh, T.; and Asuncion, H. The Role of Software Licenses in Open Architecture Ecosystems, Intern. Workshop on Software Ecosystems, Intern. Conf. Software Reuse, Falls Church, VA, September 2009.\n\n[2] Lindman, J.; Paajanen, A.; and Rossi, M. Choosing an Open Source Software License in Commercial Context: A Managerial Perspective, Software Engineering and Advanced Applications, Euromicro Conference, pp. 237-244, 2010 36th EUROMICRO Conference on Software Engineering and Advanced Applications, 2010.\n\n[3] St. Laurent, A. M. 2004. Understanding Open Source and Free Software Licensing. O'Reilly Media, Inc., Sebastopol, CA.\n\n[4] Rosen, L. 2005. Open Source Licensing: Software Freedom and Intellectual Property Law. Prentice Hall.\n\n[5] Lerner, J. and Tirole, J. 2005. The Scope of Open Source Licensing The Journal of Law, Economics, & Organization, 21(1): 20-56\n\n[6] Di Penta, M.; German, D.; Gu\u00e9h\u00e9neuc, Y.; and Antoniol, G. 2010. An exploratory study of the evolution of software licensing. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE '10), Vol. 1. ACM, New York, NY, USA, 145-154.\n\n[7] Goldman, R. and Gabriel, R. 2004. Innovation Happens Elsewhere: How and Why a Company should Participate in Open Source. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.\n\n[8] Oreizy, P. Open Architecture Software: A Flexible Approach to Decentralized Software Evolution. Ph.D. in Information and Computer Sciences, Irvine, CA, University of California, Irvine, 2000.\n\n[9] Jensen, C. and Scacchi, W. 2005. Process Modeling Across the Web Information Infrastructure. Software Process: Improvement and Practice 10(3):255-272.\n[10] Hedhman, N. Mailing list message dated 16 Dec 2004 07:18:55 -0000 \u201cRe: [ANN] Avalon Closed,\u201d available online at http://www.mail-archive.com/community@apache.org/msg03889.html, last accessed 15 September 2009\n\n[11] Dailey, D. Mailing list message dated Wed, 02 May 2007 10:38:26 -0400 \u201cRe: Support Existing Content / consensus through attrition?\u201d available online at http://lists.w3.org/Archives/Public/public-html/2007May/0214.html, last accessed 15 September 2009\n\n[12] The Prot\u00e9g\u00e9 Ontology Editor Project, available online at http://protege.stanford.edu/ [last accessed 23 June, 2008]\n\n[13] The Firefox Ontology Plugin project available online at http://rotterdam.ics.uci.edu/development/padme/browser/ontology [last accessed 23 June, 2008]\n\n[14] The Zotero Project, available online at http://www.zotero.org/ [last accessed 23 June, 2008]\n\n[15] Bringer, J. D.; Johnston, L. H. and Brackenridge, C. H. Using Computer-Assisted Qualitative Data Analysis Software to Develop a Grounded Theory Project Field Methods, 2006, 18(3): 245-266\n\n[16] Kelle, U. Theory Building in Qualitative Research and Computer Programs for the Management of Textual Data Sociological Research Online, 1997, 2(2) available online at http://www.socresonline.org.uk/socresonline/2/2/1.html [last accessed 23 June 2008]\n\n[17] Fielding, R. Message dated Sat, 08 Nov 2003 02:39:09 GMT \u201cReview of proposed Apache License, version 2.0,\u201d available online at http://mailarchives.apache.org/mod_mbox/archive-license/200311.mbox/%3cBAAB287A-1194-11D8-842D-000393753936@apache.org%3e, last accessed 14 August 2009\n\n[18] Board meeting minutes of The Apache Software Foundation, January 2004, available online at http://apache.org/foundation/records/minutes/2004/board_minutes_2004_01_21.txt, last accessed 13 August 2009.\n\n[19] Fielding, R. Mailing list message dated Sat, 24 Jan 2004 01:34:36 GMT \u201cApache License, Version 2.0 ,\u201d available online at http://mailarchives.apache.org/mod_mbox/archive-license/200401.mbox/%3C781EEF08-4E0D-11D8-915D-000393753936@apache.org%3E, last accessed 13 August 2009\n\n[20] Behlendorf, B. Mailing list message dated Sat, 22 Nov 2003 07:31:40 GMT \u201cRE: termination with unrelated trigger considered harmful,\u201d available online at http://mailarchives.apache.org/mod_mbox/archive-license/200311.mbox/%3C20031121232552.X38821@fez.hyperreal.org%3E, last accessed 13 August, 2009\n\n[21] Carlson, B. M. Mailing list message dated Sat, 8 Nov 2003 10:03:55 +0000 \u201cRe: [fielding@apache.org: Review of proposed Apache License, version 2.0],\u201d available online at http://lists.debian.org/debian-legal/2003/11/msg00053.html, last accessed 12 August 2009\n\n[22] Peterson, S.K. Mailing list message dated Fri, 14 Nov 2003 14:52:54 GMT \u201ctermination with unrelated trigger considered harmful,\u201d available online at http://mailarchives.apache.org/mod_mbox/archive-license/200311.mbox/%3C6D6463F31027B14FB3B1FB094F2C744704A11176@tayexc17.americas.cpqcorp.net%3E, last accessed 13 August 2009\n\n[23] Machovec, J. Mailing list message dated Fri, 14 Nov 2003 16:49:09 GMT \u201cRe: termination with unrelated trigger considered harmful,\u201d available online at http://mailarchives.apache.org/mod_mbox/archive-license/200311.mbox/\n[24] Fielding, R. Mailing list message dated Tue, 18 Nov 2003 02:10:27 GMT \u201cRe: [fielding@apache.org: Review of proposed Apache License, version 2.0],\u201d available online at http://mail-archives.apache.org/mod_mbox/archive-license/200311.mbox/%3c60AEF3C1-196C-11D8-A8F4-000393753936@apache.org%3e, last accessed 13 August 2009\n\n[25] Engelfriet, A. Mailing list message dated Mon, 17 Nov 2003 20:59:53 GMT \u201cRe: [fielding@apache.org: Review of proposed Apache License, version 2.0],\u201d available online at http://mail-archives.apache.org/mod_mbox/archive-license/200311.mbox/%3c20031117205953.GA95846@stack.nl%3e, last accessed 13 August 2009\n\n[26] Moglen, E. Mailing list message dated Fri, 14 Nov 2003 21:28:32 GMT \u201cFSF Comments on ASL 2.0 draft,\u201d available online at http://mail-archives.apache.org/mod_mbox/archive-license/200311.mbox/%3c16309.18688.540989.283163@new.law.columbia.edu%3e, last accessed 13 August 2009\n\n[27] Carlson, B. M. Mailing list message dated Thu, 13 Nov 2003 05:39:49 GMT \u201cDFSG-freeness of Apache Software Licenses,\u201d available online at http://mail-archives.apache.org/mod_mbox/archive-license/200311.mbox/%3c20031113053949.GD23250@stonewall%3e, last accessed 13 August 2009\n\n[28] Armstrong, D. Mailing list message dated Fri, 14 Nov 2003 04:39:50 GMT \u201cRe: DFSG-freeness of Apache Software Licenses,\u201d available online at http://mail-archives.apache.org/mod_mbox/archive-license/200311.mbox/%3c20031114043950.GM2707@donarmstrong.com%3e, last accessed 13 August 2009\n\n[29] Johnson, P. Mailing list message dated Wed, 12 Nov 2003 02:09:14 GMT \u201cMutual defence patent clause,\u201d available online at http://mail-archives.apache.org/mod_mbox/archive-license/200311.mbox/%3c003d01c3a8c1$f9b55170$c6ba400c@protocol.com%3e, last accessed 12 August 2009\n\n[30] Behlendorf, B. Mailing list message dated Wed, 12 Nov 2003 21:09:32 GMT \u201cRe: Mutual defence patent clause,\u201d available online at http://mail-archives.apache.org/mod_mbox/archive-license/200311.mbox/%3c20031112130508.H497@fez.hyperreal.org%3e, last accessed 13 August 2009\n\n[31] Fielding, R. Mailing list message dated 12/24/2003 04:16 AM \u201cRe: Review of proposed Apache License, version 2.0,\u201d available online at http://mail-archives.apache.org/mod_mbox/archive-license/200312.mbox/%3c464B4006-3604-11D8-9A9F-000393753936@apache.org%3e, last accessed 12 August 2009\n\n[32] Apache License Proposal Website, available online at http://www.apache.org/licenses/proposed/, last accessed 13 August 2009\n\n[33] Apache License, Version 1.23, available online at http://mail-archives.apache.org/mod_mbox/archive-license/200312.mbox, last accessed 13 August 2009\n\n[34] Behlendorf, B. Mailing list message dated Fri, 09 Jan 2004 22:42:52 GMT \u201cRe: Review of proposed Apache License, version 2.0,\u201d available online at http://mail-archives.apache.org/mod_mbox/archive-license/200401.mbox/%3c20040109143803.G31301@fez.hyperreal.org%3e, last accessed 13 August 2009\n\n[35] Behlendorf, B. Mailing list message dated Wed, 07 Jan 2004 22:16:36 GMT \u201cRe: Review of proposed Apache License, version 2.0,\u201d available online at http://mail-archives.apache.org/mod_mbox/archive-license/200401.mbox/%3c20040107140658.A23429@fez.hyperreal.org%3e, last accessed 13 August 2009\n[36] Fielding, R. Mailing list message dated Wed, 14 Jan 2004 20:25:50 GMT \u201cRe: Review of proposed Apache License, version 2.0,\u201d available online at http://mail-archives.apache.org/mod_mbox/archive-license/200401.mbox/%3cD81EA136-46CF-11D8-B08A-000393753936@apache.org%3e, last accessed 13 August 2009\n\n[37] Fielding, R. Mailing list message dated Wed, 14 Jan 2004 20:54:26 GMT \u201cRe: Review of proposed Apache License, version 2.0,\u201d available online at http://mail-archives.apache.org/mod_mbox/archive-license/200401.mbox/%3cD6DB9454-46D3-11D8-B08A-000393753936@apache.org%3e, last accessed 13 August 2009\n\n[38] Armstrong, D. Mailing list message dated Sat, 24 Jan 2004 02:13:50 GMT \u201cRe: Apache License, Version 2.0,\u201d available online at http://mail-archives.apache.org/mod_mbox/archive-license/200401.mbox/%3C20040124021350.GG3060@archimedes.ucr.edu%3E, last accessed 13 August 2009\n\n[39] Fielding, R. Mailing list message dated Sat, 24 Jan 2004 02:29:29 GMT \u201cRe: Apache License, Version 2.0,\u201d available online at http://mail-archives.apache.org/mod_mbox/archive-license/200401.mbox/%3C23385101-4E15-11D8-915D-000393753936@apache.org%3E, last accessed 13 August 2009\n\n[40] Free Software Foundation Licenses webpage, available online at http://www.fsf.org/licensing/licenses/index_html#GPLCompatibleLicenses, last accessed 14 August 2009\n\n[41] Massol, V. Mailing list message dated Sun, 25 Jan 2004 16:01:19 GMT \u201cHow to use the 2.0 license?,\u201d available online at http://mail-archives.apache.org/mod_mbox/archive-license/200401.mbox/%3C012f01c3e35c$78e229d0$2502a8c0@vma%3E, last accessed 13 August 2009\n\n[42] Behlendorf, B. Mailing list message dated Sun, 25 Jan 2004 20:17:06 GMT \u201cRe: How to use the 2.0 license?,\u201d available online at http://mail-archives.apache.org/mod_mbox/archive-license/200401.mbox/%3C20040125121456.H396@fez.hyperreal.org%3E, last accessed 13 August 2009\n\n[43] Adams, E. NBDiscuss mailing list message: \u201cJoint Copyright Assignment,\u201d available online at http://www.netbeans.org/servlets/ReadMsg?list=nbdiscuss&msgNo=2228, last accessed 6 August, 2009\n\n[44] The Apache Software Foundation Individual Contributor License Agreement, Version 2.0 available online at http://www.apache.org/licenses/icla.txt, last accessed 20 October 2009.\n\n[45] Brabant, V. mailing list message dated Tue, 15 Jul 2003 18:52:36 +0200 \u201c[nbdiscuss] Re: licenses and trees,\u201d available online at http://www.netbeans.org/servlets/ReadMsg?listName=nbdiscuss&msgNo=2547, last accessed 20 October 2009.\n\n[46] West, J. and O'Mahony, S. 2005. Contrasting Community Building in Sponsored and Community Founded Open Source Projects. In Proceedings of the Proceedings of the 38th Annual Hawaii international Conference on System Sciences - Volume 07 (January 03 - 06, 2005). HICSS. IEEE Computer Society, Washington, DC, 196.3.\n\n[47] Lerner, J. and J. Tirole. 2000. The simple economics of open source. NBER Working paper series, WP 7600, Harvard University, Cambridge, MA.\n\n[48] von Hippel, E. and von Krogh, G. 2003. Open source software and the 'private-collective' innovation model: Issues for organizational science. Organization Science, 14(2):209-223.\n\n[49] Hedhman, N. mailing list message dated Sun, 29 Jun 2003 13:31:48 +0800 \u201c[nbdiscuss] Re: licenses and trees (was: Anti-Sun Animosity),\u201d available online at http://www.netbeans.org/servlets/ReadMsg?listName=nbdiscuss&msgNo=2578, last accessed 21 October 2009.\n[50] NBDiscuss mailing list message. Available online at http://www.netbeans.org/servlets/ReadMsg?list=nbdiscuss&msgNo=3784 last accessed 28 February 2009\n\n[51] Shah, S.K. 2006. Motivation, governance and the viability of hybrid forms in open source software development, Management Science, 52(7), 1000-1014.", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/012-jensen-scacchi.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 22, "total-input-tokens": 47810, "total-output-tokens": 12673, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 1646, 1], [1646, 5007, 2], [5007, 7979, 3], [7979, 9197, 4], [9197, 12117, 5], [12117, 15603, 6], [15603, 17257, 7], [17257, 19225, 8], [19225, 20093, 9], [20093, 23114, 10], [23114, 26434, 11], [26434, 28470, 12], [28470, 29284, 13], [29284, 30261, 14], [30261, 31297, 15], [31297, 34382, 16], [34382, 37810, 17], [37810, 40465, 18], [40465, 43662, 19], [43662, 46874, 20], [46874, 50283, 21], [50283, 50592, 22]]}}
|
|
{"id": "84203e867ebd8e0dad6d9c93543c59855698a90d", "text": "\u201cThey Can Only Ever Guide:\u201d How an Open Source Software Community Uses Roadmaps to Coordinate Effort\n\nDANIEL KLUG, CHRISTOPHER BOGART, and JAMES D. HERBSLEB, Carnegie Mellon University, USA\n\nUnlike in commercial software development, open source software (OSS) projects do not generally have managers with direct control over how developers spend their time, yet for projects with large, diverse sets of contributors, the need exists to focus and steer development in a particular direction in a coordinated way. This is especially important for \u201cinfrastructure\u201d projects, such as critical libraries and programming languages that many other people depend on. Some projects have taken the approach of borrowing planning tools that originated in commercial development, despite the fact that these techniques were designed for very different contexts, e.g. strong top-down control and profit motives. Little research has been done to understand how these practices are adapted to a new context. In this paper, we examine the Rust project\u2019s use of roadmaps: how has an important OSS infrastructure project adapted an inherently top-down tool to the freewheeling world of OSS? We find that because Rust\u2019s roadmaps are built in part by summarizing what motivated developers most prefer to work on, they are in some ways more a description of the motivated labor available than they are a directive that the community move in a particular direction. They allow the community to avoid wasting time on unpopular proposals by revealing that there will be little help in building them, and encouraging work on popular features by making visible the amount of consensus in those features. Roadmaps generate a collective focus without limiting the full scope of what developers work on: roadmap issues consume proportionally more effort than other issues, but constitute a minority of the work done (i.e issues and pull requests made) by both central and peripheral participants. They also create transparency among and beyond the community into what central contributors\u2019 plans are, and allow more rational decision-making by providing a way for evidence about community needs to be linked to decision-making.\n\nCCS Concepts: \u2022 Human-centered computing \u2192 Open source software; \u2022 Social and professional topics \u2192 Sustainability.\n\nAdditional Key Words and Phrases: collaboration; common pool resources; open source; Rust language\n\nACM Reference Format:\nDaniel Klug, Christopher Bogart, and James D. Herbsleb. 2021. \u201cThey Can Only Ever Guide:\u201d How an Open Source Software Community Uses Roadmaps to Coordinate Effort. Proc. ACM Hum.-Comput. Interact. 5, CSCW1, Article 158 (April 2021), 28 pages. https://doi.org/10.1145/3449232\n\n1 INTRODUCTION\n\nOpen source software (OSS) has come to fulfill an infrastructure role in the economy. Eghbal [26] highlights OSS projects such as MySQL and Ruby that both OSS and industrial projects depend on heavily, but are themselves non-profit OSS projects. To fulfill an infrastructural role, there needs to be careful coordination among maintainers and users of the infrastructure [68], who are doing the work on behalf of different companies or foundations, or perhaps as volunteers. Good coordination...\nis especially important for infrastructure projects, since by definition the project is an essential underpinning of many other projects: poorly-considered changes can damage these stakeholders more than they would if the project was merely an incidental dependency of other projects, that they could simply swap out for an alternative. Coordination of work in self-organizing systems[27] poses a difficult and important problem in CSCW.\n\nHow can software infrastructure projects ensure that they will not only be maintained in the future, but will preserve values that their users depend on? Unlike in commercial software development, in OSS \u201cdeveloper community\u201d [78] software projects there is no manager who has direct control over which features or attributes developers choose to spend their time on, yet these projects still need to somehow coordinate, stabilize, and make visible their development priorities. Open source projects do have governance, but governance models do not generally dictate what features will be added and when. For example, even in the highly orchestrated work in the Linux kernel, there are multiple coordination processes, driven by the open source norm that contributors self-select their tasks [75].\n\nMuch preexisting work in CSCW has focused on the tensions between infrastructure contributors\u2019 work on infrastructure and their own priorities, often driven by the primary work they do that the infrastructure is intended to support. For example in scientific software written by academic collaborators, short-term paper deadlines can lead people to focus on needed new features over long-term maintainability [68]; on the other hand infrastructure development can offer contributors new opportunities leading them to realign their own priorities [11], perhaps helping build consensus. Researchers have identified a broad spectrum of ways that OSS communities can organize themselves to coordinate development and avoid tragedy-of-the-commons problems [50], but in some cases preexisting social networks among contributors drive much of the work done [46]. Some OSS projects have taken the approach of borrowing planning tools that originated in commercial development, milestones and issue tracking (e.g. Scala 1), beta testing (e.g. PostgreSQL 2) or roadmaps (e.g. Rust), despite the fact that these techniques were originally conceived for very different contexts, i.e. strong top-down control and profit motives, in which executives and managers make final decisions about goals and timelines, and rank-and-file developers are responsible for carrying out these plans. Developers in open source communities, in contrast, are often free to choose their own tasks, so this bottom-up power may have an impact on how planning tools work in the open-source world.\n\nInvestigating how diverse OSS projects attempt to shape collaboration in a stable visible way requires considering the bottom-up forces at work: developers\u2019 motivation whether and how to contribute, users\u2019 motivation to choose, support, or influence development, and factors that make one project survive while another fails [98], as well as the top-down techniques leadership employs in projects despite the relative lack of power that OSS leaders have over their communities [75].\n\nIn this research we investigate how consensus around a community\u2019s direction is constructed, maintained, and evaluated. We approach this by considering how roadmaps as an originally top-down technique from industry are adapted and reconfigured to work for an OSS project. Roadmaps can be understood as a layout of existing plans to make future decisions. They are usually a visualization of further steps [97] intended to be open to later revision [79]. We do not investigate what effect the choice of roadmapping had over some other method of coordination the community could have chosen, or the process of deciding on the use of roadmaps in the first place; but rather how they carried out the particular method they did choose, and its immediate effects during\n\n1 https://github.com/scala/scala/milestones\n2 https://www.postgresql.org/developer/beta/\none iteration. We look at an OSS roadmap\u2019s creation, how it is applied, and how the community evaluates its impact, by addressing the following research questions:\n\n**RQ1.** What functions does a roadmap serve in an open source community?\n\n**RQ2.** How does an open source community use a roadmap to fulfill those functions?\n\nOur results show that although a roadmap appears superficially to be an edict from project leaders specifying where resources should be applied, it in fact reflects a consensus among active developers about where they wish to apply their efforts. Its power derives not just from the core developers\u2019 ability to accept or reject changes, but because it reassures a would-be contributor that productive developers are already motivated to collaborate with them, if they stick to roadmap-related topics. The roadmap-building process helps these developers reach consensus, and community members use the roadmap throughout the year as a rhetorical resource to cut off digressions and to signal intention to cooperate with community goals.\n\n## 2 BACKGROUND\n\nThese research questions address an apparent mismatch between the idea of volunteers coming together to do work that motivates them, and roadmaps as plans that on their surface appear to be telling people what to do. Prior research has only partly explained how open source collaborators set directions, and literature on roadmapping in corporate settings appears to reveal little about how roadmapping applies to volunteer projects. In this section we describe prior research in both these areas.\n\n### 2.1 The Problem of Coordinating Developer Effort in Open Source Software\n\nIn recent years, the use of OSS has become pervasive [35] among software developers resulting in great economic value of OSS [20, 34] which is, however, largely invisible to the public. Although OSS is often critical infrastructure [26], it is managed very differently from traditional infrastructure. Its users can freely distribute, access, adapt, modify and redistribute source code for their own and for community use. Analyses of OSS projects from various social and organizational perspectives have shown that managing such a project requires taking into account developers\u2019 distinct motivations for contributing [5, 15, 38], benefits and rewards of contributing [13, 44, 54], preferred levels of involvement [4, 62], building and managing social capital [66, 80], networking [60, 76, 77], and differing communication and interaction strategies [6, 19, 33].\n\nThe varying motivations and characteristics raise the question of how OSS communities coordinate to agree to and work towards common goals. We define \u201ccoordination\u201d as many individuals deciding how to work together effectively; that is, how to choose tasks that amount to collective progress in a mutually agreeable direction as opposed to working at cross-purposes. OSS contributors and maintainers often work in a distributed and decentralized way, with very little hierarchy or institutional structure [22, 99], and are more likely to engage in projects and tasks based on personal interests [5]. Coordinating and organizing work in OSS projects therefore involves matching the demand for effort (desired features and known bugs that will take time and specialized skills to fix) with supply of effort (volunteers and paid developers who have their own motivations and priorities).\n\n#### 2.1.1 Supply of and Demand for Development Work.\n\nLike any software, OSS requires maintenance \u201cto correct faults, improve performance or other attributes, or adapt to a changed environment\u201d [48]. Unfulfilled demand for maintenance may render regular software obsolete. But for infrastructure, the ramifications of insufficient maintenance are magnified because other projects and their users...\nrely on the infrastructure; thus the demand for development effort is greater, coming from a large dependent pool of projects and users. Prior research shows the demand for maintenance work, such as issue fixes, testing, and documentation may depend on many factors: for example, the size of the user base for a particular feature [56], or extent of upstream or interdependent projects [12]. Research on managing OSS requirements [73, 103] shows how demand is discovered, analyzed, prioritized, and validated within discussions and issue requests. Popular projects need help triaging user-reported issues [2, 104]. Infrastructures typically also need coordinators [65] who ensure that individual projects have features needed for an infrastructure-wide release.\n\nSkilled volunteers are motivated by factors such as their strength of identification with the community [38], internal (e.g., self-use) and external (e.g., reputation) motivations [36, 45], a desire to learn [102, 105], or long-term \u201chobbyist\u201d status, in which developers become more deeply involved and play a critical role in long-term viability [74]. Developers hired by industry also play an increasing role in OSS development [28]. Firms are more likely to pay developers to participate as a way of sharing the cost of innovation, creating demand for their complementary products or services, establishing their technology as de facto standard, or attracting improvements or complements for their products [100]. However, industry support for OSS projects carries some risk of discouraging volunteers. But this can be mitigated by transparency in decision-making [101] and negotiation of governance, membership, ownership, and control over production [58].\n\n2.1.2 Matches and Mismatches in Effort Supply and Demand. Participants in OSS infrastructure are generally free to contribute anywhere. These individual decisions bring about an emergent allocation of effort across projects. But besides the decision-making of individual participants, it is unclear what mechanism influences participants to apply effort where there is the greatest need. In contrast, it is clear that in commercial firms participating in markets, the forces of supply and demand determine price, a strong signal guiding the allocation of resources [9]. Economists Dalle & David [21] were puzzled about \u201chow, in the absence of directly discernible market links between the producing entities and \u2018customers,\u2019 the output mix of the OSS sector of the software industry is determined. Yet, to date, the question does not appear to have attracted any significant research attention\u201d. We were unable to find research that addresses this issue in the years since then. The study of requirements management points out the difficulties of discovering, articulating, and implementing needed features even when development effort is plentiful [103]. The lack of development effort has been documented in the highly publicized Heartbleed bug [23], but we are not aware of systematic studies of under-supply or how to recognize and address it. In total, the research seems to support the conclusion that there currently is no general mechanism closing the gap between demand for and supply of effort, except for the perceptions and decision-making of individual developers. Yet infrastructure and effort mismatch are difficult for participants to see [68].\n\n2.1.3 Organizing and Allocating Work in Open Source Software Projects. OSS project leaders face tradeoffs between openness and fostering a productive collaboration. Decision-making behind closed doors can cause conflict that discourages volunteers, since they may feel their preferences are not being considered [40]. But too much visibility into disagreements among leadership can also lead to uncertainty among volunteers that decisions may not be firm and their contributions may not end up being used [82].\n\nWith often only partial control and limited means of enforcement, OSS project leaders may rely on social factors such as their technical reputation and community traditions to promote a vision of the project\u2019s direction [55, 64]. Publishing schedules and roadmaps can help get developers to identify with and take responsibility for community goals [64]. Leaders may develop formal\npolicies and guidelines for collaboration to give structure to developers\u2019 work [40], and may assert the authority to reject additions in a given software release [55].\n\nPrior research has identified implicit ways that core members influence newcomers and peripheral members to adopt cultural norms and practices. Hemetsberger and Reinhardt [37] describe a number of mechanisms that core members of open-source projects such as KDE use to enculturate peripheral members: for example that project\u2019s manifesto\\(^3\\) may discourage non-like-minded contributors, and KDE\u2019s leaders enforce norms through mailing list discussions and code review processes. Crowston and Shamshurin [18] showed that core members of successful Apache incubator projects were more communicative than in unsuccessful projects, and were more likely to use pronouns in a way that suggested inclusiveness of the peripheral community. Gallivan [32], however, argues that rigorous control, standardization, and measurability (\u201cMcDonaldization\u201d) helps open source projects achieve common objectives in virtual, distributed environments where trust relationships are difficult to form; in particular despite potentially many mutual trust relationships in open-source communities, control is a one-directional relationship from core to periphery.\n\n2.2 Roadmaps in Commercial and OSS Development\n\nRoadmaps are plans for use of resources over time, often created in iterative and reflective processes [61] and intended to be open for changes [79]. The goal is to lay out existing plans, future decisions, and visualize further steps [79, 97] that may be revised based on project results [41]. In commercial contexts, developer resources and needs are coordinated explicitly by management, and roadmaps are a tool to create, implement, and manage software in alignment with company strategies, product life-cycles, and audiences [24, 30, 96].\n\nIn Software Product Management (SPM), roadmaps are a communicative tool for knowledge sharing [81], consensus-reaching, and individual interpretation of goals by people involved in development processes [47]. For example, product roadmaps present features to manage product stages [49, 96], select and assign requirements [25], and connect teams to ensure the success of a product within a larger time frame [30, 96]. To create roadmaps, information about audiences, their characteristics, and needs is usually collected beforehand [7]. As a communicative tool, roadmaps describe what will be (or should be) achieved in which way in a project, and how it will meet business objectives [57].\n\nMany OSS projects generate roadmap documents, including large OSS communities such as React [67], Facebook Libra [84], Scala [85], and QT [95] as well as industry-produced OSS such as AWS CloudFormation [14] and industrial coalitions like Open Service Broker [3]. These roadmaps appear to have varying roles in the communities. Some seem to have multiple versions as if they are being maintained and revisited, while others are one-time descriptions of envisioned future features. However it is difficult for a casual observer to tell what importance these roadmap documents play. In this research we choose the Rust Language community as particular example to examine its use of roadmaps.\n\n3 CASE STUDY: ROADMAPS IN THE RUST LANGUAGE PROJECT\n\nBased on our theoretical propositions, we selected the Rust programming language as a single-case study. It is appropriate both because of its popularity and its openness. Its popularity as infrastructure means that there are many users who may pressure participants to make and implement good choices about features and priorities. Its openness means that a rich variety of data about the Rust compiler community\u2019s working and decision-making processes is available from blogs, forums, and GitHub repositories. Thus we have the opportunity to study in great detail a community\n\n\\(^3\\)https://manifesto.kde.org/\nmaking and implementing consequential choices together. This constitutes what Yin [106] calls a \u201crevelatory case\u201d as it provides \u201can opportunity to observe and analyze a phenomenon previously inaccessible to social science inquiry.\u201d\n\nThe Rust programming language has been growing into the role of a popular and important part of the software infrastructure [59]: many individuals, subteams, and outside organizations have a stake in its future. Rust is promoted as being suitable for infrastructural code where performance and reliability are important, such as web browser engines or in hardware devices with limited resources; for that reason it is used by numerous big tech companies [70], such as Facebook and Mozilla. The Rust community is organized in teams [69] and work groups. It has a large and active social community, with a variety of blogs, chat rooms, forums, GitHub discussion threads, and in-person conferences and meetings worldwide.\n\nThe Rust community adopted, then evolved, a roadmapping process, adding to the purposes that the roadmap serves over time. After the release of version 1.0 in 2015 the Rust core team initiated a process to organize and prioritize future work and to define future goals in all areas of Rust, citing a need to sequence feature additions to avoid later rework, and prioritize features that would solve many problems or benefit many users [51]. In 2016, the Rust team refined their RFC (request for comments) process; RFCs are documents proposing significant changes to the project [93]. An overarching roadmap process was added to define initiatives as rallying points with concrete goals, fixed time frames, and clear commitments from individuals. This process involves building consensus in the community on project-wide goals, then proposing these goals for community discussion through an RFC, and finally advertising and publishing the agreed upon goals as a yearly roadmap.\n\nThe Rust core team [69] released the first Rust roadmap in February 2017 [94]. To create the roadmap, the core team gathered priorities through a Rust community survey [92] and a commercial user survey with companies using Rust [91]. For the 2018 roadmap, in addition to the annual survey [83], the core team asked the Rust community to blog and post ideas for Rust in the next year [87]. The Rust community submitted 100 blog posts with suggestions for the roadmap. The core team then collected and incorporated the suggestions into an RFC for discussion and review [71], and released the roadmap in March 2018 [88]. The 2019 roadmap followed a similar process [86]: building on 73 community blog posts [72], a survey [90], and the RFC discussion, the core team created the roadmap and released it in March 2019 [89]. Unlike previous years, the 2019 roadmap was explicitly organized around Rust\u2019s team structure, and made explicit mention of those teams having their own roadmaps.\n\nThe process thus has evolved over four years to more thoughtfully sequence development, to prioritize the worst problems and the most users, to elicit both broad (survey) and deep (narrative blog post) input from the community, to devolve some planning to the separate teams in the form of team-specific roadmaps, and, finally, to ensure that chosen initiatives are are not only needed, but actually supported by people willing to commit to working on them.\n\n3.1 The 2018 Rust roadmap\n\nThe 2018 roadmap, available at https://github.com/rust-lang/rfcs/blob/master/text/2314-roadmap-2018.md, lays out four major goals: shipping a \u2018Rust 2018\u2019 edition of the language, creating more documentation support for intermediate-level Rust programmers, encouraging global spread of Rust by adding internationalization support and links with local Rust groups, and finally, strengthening the compiler\u2019s work teams and their leadership. The document goes on to identify several concrete things that need to be done to support those areas.\nThe 2018 compiler release that is the Roadmap\u2019s first goal focuses on support for four identified use cases for the language: network services, WebAssembly (i.e. use in web browsers), command line applications, and use in embedded devices.\n\nThe document also specifies a rough schedule for the year, starting with design work in February and March 2019, focusing on RFCs, \u201cbuckling down\u201d in April through July, focusing on development work, \u201cFun!\u201d in August through November, focused on forward-looking, exploratory features, and \u201cReflection\u201d in December.\n\nThe document ends with a brief discussion of \u2018rationale, drawbacks, and alternatives\u2019.\n\n3.2 Other Rust documents\n\nThe Rust project publishes a great many documents defining their product, their community, and its governance. Documents that are somewhat standard for open source projects, available at the project\u2019s GitHub site at https://github.com/rust-lang/rust, include a \u201cREADME.md\u201d telling users what Rust is and how to install it, copyright and license files positioning the work legally, \u201cCONTRIBUTING.md\u201d and \u201cCODE_OF_CONDUCT.md\u201d files laying out high level community norms for how developers can contribute and how they are expected to interact, and \u201cRELEASES.md\u201d describing the change history of the project at a high level. Beyond that the project provides a wealth of deeper information, including \u201cThe Rust Programming Language\u201d\\(^4\\) that teaches the language itself, the \u201cGuide to Rustc Development\u201d\\(^5\\) teaching how the compiler works and going into great depth about contribution norms and governance.\n\nAs of September 2019, beyond the compiler project itself, the Rust community had 147 other GitHub repositories under its organizational umbrella, including the collection of RFCs and the discussions around them https://github.com/rust-lang/rfcs (the annual roadmaps are found among these RFCs); the other repositories hold auxiliary tools, bots, websites, and documents.\n\n4 METHODOLOGY\n\nUnderstanding how communities work is often a complex research matter that requires large data collection. Our research benefits from the Rust community being very open and communicative; they produce lots of publicly accessible artifacts that document community and software related activities. Therefore, a high volume of data is available to researchers about how the community builds, maintains, and evaluates consensus about its direction.\n\n4.1 Data Collection\n\nTo analyze what functions a roadmap serves to the Rust community and how they use it to fulfill those functions, we collected the following publicly available data produced by the Rust community.\n\n4.1.1 Yearly Rust Roadmaps. We focused on the community-wide 2018 Rust roadmap and collected the official roadmap document [88]. Because the Rust community introduced its first roadmap for 2017, analyzing the 2018 roadmap allows to look at the past and the following years\u2019 roadmap to include the community\u2019s own reflection on how the roadmap was used. We collected 97 of the 100 blog posts [71] (3 were no longer retrievable) submitted by Rust community members during the process of creating the 2018 roadmap, written in response to the core team\u2019s call for goals and directions for Rust in 2018.\n\n4.1.2 Direct records of Rust compiler project work. The Rust community uses the RFC process to find consensus on proposed substantial changes to the language, standard libraries, and also to\n\n\\(^4\\)https://doc.rust-lang.org/book/index.html\n\\(^5\\)https://rustc-dev-guide.rust-lang.org/\nFig. 1. We gathered software engineering artifacts, GitHub comments, blog posts, and email interviews. We analyzed software engineering artifacts and a set of pre-roadmap blog posts for roadmap-relevant content. We analyzed GitHub comments, chats, blog posts, and interview text through qualitative coding, and statistically analyzed Likert-scale answers in the email interviews. We describe the functions and mechanisms of the roadmap by drawing on all three types of analysis.\n\nCommunity standards. Issues and PRs (pull requests: i.e. proposed specific edits to code) are often linked to RFCs and show where the actual coding work of all contributors happens and to what Rust contributors allocate their time and effort. Comments in these RFCs, issues, and PRs involve discussions among contributors and teams. We scraped all code and discussion contents of GitHub repositories associated with the Rust compiler project from Jan 1, 2018 to Dec 31, 2018, the time frame for the 2018 roadmap. This data allowed us to analyze how much of which kind of work (coding work and discussion work) by which people (core or peripheral people) adhered to the topics called for in the roadmap.\n\n4.1.3 Records of argumentation and discussion. To understand how participants used the roadmap as a resource for argumentation during the year to affect decisions and priorities, we collected excerpts from across several communication channels used by the Rust community (Table 1) in which people explicitly mentioned the roadmap (i.e. explicit mentions of the word \u201croadmap\u201d or \u201croad map\u201d):\n\n- **Compiler project work** We extracted roadmap mentions from the corpus of RFC, issue, and PR discussions described above, excluding any mentions in the roadmap\u2019s own RFC#2314 (https://github.com/rust-lang/rfcs/pull/2314).\n\n- **Posts in Rust blogs and forums** Some participants in the Rust project, as in many OSS communities, maintain personal and official community blogs to post about updates, goals, ideas, or critical thoughts. To gather samples of participants explicitly using the existence and content of the roadmap as a resource in argumentation about the project direction, we searched for roadmap mentions in posts of main publicly accessible Rust blogs (Rust Blog (https://blog.rust-lang.org), Inside Rust Blog (https://blog.rust-lang.org/inside-rust), Read Rust (https://readrust.net), This Week in Rust (https://this-week-in-rust.org)) and the Rust Internals forum (https://internals.rust-lang.org) from Jan 1, 2018 to Apr 23rd, 2019. This time period was extended past the end of the year specifically to include posts advocating for content for the 2019 roadmap, since they might contain reflections about the 2018 roadmap and its content. The 2019 call for roadmap blog posts explicitly asked Rust contributors to reflect on Rust in 2018 [86].\n\n- **Online team meetings:** As an OSS community, Rust contributors characteristically are distributed all over the world which is why meetings are mainly held online. The Rust compiler team holds weekly meetings on the collaborative chat software Zulip (https://rust-lang.zulipchat.com) to update, manage, monitor, and plan work, in working groups and\nTable 1. total number of collected data and excerpts of each data that mention \"roadmap\"\n\n| data collected | RFC, issue, and PR comments on GitHub | Blog and forum posts | Blog posts reflecting on roadmap | messages in Zulip chat threads | total |\n|----------------|--------------------------------------|----------------------|---------------------------------|-------------------------------|-------|\n| mentions of \"roadmap\" | 135,234 | 3,394 | 73 | 58,901 | 197,602 |\n| 59 | 110 | 28 | 144 | 341 |\n\nthroughout the larger community. Zulip conversations are semi-public: members need to create a free account and log in to participate, thus setting a low barrier to read or contribute to the discussions. Anticipating that team members and contributors might use these online meetings to discuss matters related to roadmaps and roadmap processes, we searched for roadmap mentions in Rust team meetings held on Zulip starting from Jan 1, 2018 and extending a few months beyond the end of 2018 to Apr 23rd, 2019, so as to also include reflection on the 2018 roadmap that happened in early 2019.\n\nIn the textual data collected from GitHub comments, online meetings, and blog post we identified a total of 118 participants by name and username who made at least one comment or multiple comments regarding roadmaps. We anonymized participants (P001, P002, ..., P118) chronologically by appearance in the different data sources. Five participants were core team members, 28 were members of other teams, 85 were non-team members, and five were identified as working group members (see Fig. 2).\n\n4.1.4 Email Interviews. In addition to our data mining, we conducted short emailed structured interviews with Rust contributors to contextualize some of our findings about the two research questions. We generated a sample of community members stratified by level of community involvement. To find highly involved members, we collected a list of all Rust team members and all blog post authors for the 2018 roadmap (99 people at the time of the sampling). For the less-involved members we chose a random sample of the same size, out of all other committers to the compiler project who listed emails on their Github profiles. After later data cleaning (people with multiple or invalid emails), we ended up with a list of 190 candidates. We mailed the interview to those candidates, and 39 people responded (20.5% response rate). 24 of those identified themselves as belonging to a Rust team, and 15 said they did not (see Fig. 2). As the email interviews were conducted anonymously, we could not match participants with our existing list of participants in Rust forums. Therefore, interview participants were anonymized and numbered separately (PS001, PS002, ..., PS039). The interview questions asked Rust contributors about their experience with and opinions on all Rust roadmaps of any year. The questions are shown in Appendix A.\n\n4.2 Data Description and Analysis\n\nOur case study includes data collected from GitHub to reconstruct the allocation of effort in code work, textual data from several Rust community sources to analyze the communicative aspects of creating and using roadmap documents and discussing work effort related to roadmap topics, and answers from structured email interviews with Rust community members to triangulate results obtained from the collected textual data. To analyze how the Rust community creates, uses, and evaluates roadmaps, we decided to follow a mixed-method approach as quantitative or qualitative methods by themselves could not sufficiently address our research questions [16]. We simultaneously used quantitative and qualitative data collection methods and followed a convergent approach to separately analyze the data sets and then combine results in the interpretation. Following this\nmethodological approach, our goal was to generate a complete and deep understanding [17] of how roadmaps are used to discuss and allocate effort.\n\nWe used a quantitative technique to estimate the proportion of work done during the year that was relevant to the community-wide 2018 roadmap. We developed a roadmap topic heuristic for determining whether a given piece of text was relevant to topics mentioned in the roadmap. The purpose of the heuristic was to give us an objective way of saying whether a unit of discussion or coding was part of the roadmap or not, and secondarily, which part of the roadmap it pertained to. The heuristic starts with some hand-written regular expressions built around topics we found in the 2018 roadmap, and identifies text by applying those regular expressions, and also by making inferences about topics of \u201crelated\u201d items, for example inferring that an issue that claims to track an RFC probably addresses the same topic as the RFC does. Its output is a list of all issues, pull requests, RFCs, and commits, tagged as \u201cin roadmap\u201d or \u201cnot in roadmap\u201d. The algorithm is described in detail in Appendix B. We applied this heuristic to create two datasets:\n\n- To identify where ideas in the roadmap came from, we applied this heuristic to the 97 retrievable blog posts that answered to the 2018 Rust call for roadmap blogs, generating a mapping between 2018 roadmap topics and the blogs which the core team drew on in preparing the roadmap. We also identified whether each post was written by a member of a Rust team.\n\n- To estimate the influence of the roadmap on work done throughout 2018, we applied the heuristic to all Rust project issues and Rust project PRs, creating a data set consisting of one record per PR or issue, tagged with: a (possibly empty) set of roadmap topics, the context of discussion (issue, or PR), the type of contributor (Rust team member or not), and two measures of work effort: discussion work and coding work. Discussion work was operationalized as the number of characters of English text in PR and issue discussion threads (after removing code snippets); coding work was operationalized as lines of code added or removed in the Rust project commits associated with PRs.\n\nThese datasets distinguish individual participants as \u201cteam\u201d vs \u201cnon-team\u201d: we defined these by scraping the membership of all Rust teams (Figure 2) from the project\u2019s governance page.\n\nFig. 2. We classify Rust community members as \u201cteam\u201d (191 people) or \u201cnon-team\u201d (other participants, whether contributing code or other effort), depending on whether they were listed on some team in https://www.rust-lang.org/governance on January 3, 2019. Although organizational literature often refers to \u201ccore\u201d and \u201cperipheral\u201d members, to avoid confusion we use the word \u201ccore\u201d for the 9-person team the Rust governance page identified as the \u201ccore team\u201d, \u201cteam\u201d to refer to the 191 members of teams (including core), and \u201cnon-team\u201d for the larger community periphery.\n\nAs a supplemental check on sources of ideas in the roadmap, we manually inspected the ten commits to the 2018 Rust roadmap document in the GitHub Rust RFC project and summarized the changes, looking for introductions of new topics (none were found). This was a small, relatively\n\n---\n\n6https://www.rust-lang.org/governance, in January 2019 as retrieved from https://web.archive.org/web/20190103220022/https://www.rust-lang.org/governance\nTable 2. Examples for applying codes to excerpts and sorting them into categories\n\n| Excerpt | Code | Category |\n|------------------------------------------------------------------------|-----------------------------|-----------------------------------------------|\n| \u201ca key step in any successful WG is going to be forming a **roadmap**\u201d | point out need for a roadmap | creating a roadmap |\n| \u201cit\u2019s not the kind of change that\u2019s targeted for the roadmap this year\u201d | rejecting an RFC | using roadmaps to decline allocating effort |\n\ninformal effort since a cursory check showed that little substantial change to the RFC had been made during the discussion. Complementing this quantitative technique, we also created a dataset of hand-coded roadmap mentions from project work, team meetings, and blogs. Table 1 shows the amount of data collected from each source. We extracted 341 excepts that mentioned \"roadmap\" or \"road map\" from the collected data, tracking for each excerpt its author and source.\n\nIn our case study of textual data collected from GitHub comments, online meetings, and blog posts, we followed a qualitative content analysis approach [42, 52] to characterize what people said about roadmaps in the excerpts of these sampled Rust online artifacts.\n\nWe decided to use qualitative content analysis for our case study because the method is rooted in social research but is not linked to any particular science or concepts [43]. This makes it a very useful approach to study documents and artifacts across various data sources [8]. Content analysis is profitable for mixed-method research as it comprises quantitative and qualitative methodology and qualitative content analysis in particular allows the researcher to extract manifest and latent information from different textual data [10].\n\nWe used a data-driven open coding approach across all collected excerpts from the text-based data sources (GitHub comments, online meetings, blog posts) [52]. We performed inductive coding and created preliminary codes to construct a coding scheme while processing through the qualitative data. Open codes from all data sources were then combined into larger categories. In total, we generated 91 codes (see Table 2 for code examples), that were then sorted into eight categories (see Table 3).\n\nThroughout the open coding process, the research team ensured a common shared agreement of generated and applied codes. The coding of each varying textual data set (GitHub comments, online meetings, blog posts) was based on the consistent use of codes by one researcher and the subsequent review of generated and applied codes by a second researcher. In this process, little disagreement was found. In such cases, the two researchers met to review, discuss, and refine the disagreed upon codes in relation to the data source and to which research question the coded excerpt relates most. Through this discussion and refinement, all disagreements were solved and codes were mutually validated. This way of ensuring validity in qualitative research through agreement is an established approach in the CSCW community [53] and matches our inductive coding approach for a qualitative case study across varying textual data sources.\n\nIn addition to analyzing blog posts and online meetings, the structured email interviews served to collect additional data to triangulate results we observed [29]. Interview questions asked about how roadmaps influence decision-making, how helpful roadmaps are for the community, and how roadmaps match personal work priorities (see Appendix A). We analyzed the numeric responses, shown in Table 4. To identify themes in the textual responses, one researcher grouped responses to each question into categories, and another researcher reviewed and challenged the categorizations.\n\n5 RESULTS\n\nIn the following two subsections we answer the research questions: what does the roadmap accomplish for the Rust community (RQ1), and how does it do so (RQ2).\nTable 3. Number of excerpts and number of codes applied per category\n\n| Category | Num. excerpts | Num. codes applied |\n|---------------------------------|---------------|--------------------|\n| Creating a roadmap | 134 | 17 |\n| Using roadmap to decline | 33 | 13 |\n| allocating effort | | |\n| Pointing effort to roadmap | 26 | 12 |\n| topics | | |\n| Executing a roadmap | 81 | 28 |\n| Asking about a roadmap | 11 | 4 |\n| Linking to roadmap documents | 28 | 2 |\n| Praising the use of a roadmap | 13 | 6 |\n| Criticizing the use of a roadmap| 15 | 9 |\n| **total** | **341** | **91** |\n\nTable 4. Summary of responses to email interview. Q1-3 asked for textual explanations accompanied by a Likert-style question on a five-point scale, where 3 would be a neutral answer, and 5 means the roadmap is high in influence on respondent\u2019s activities, helpfulness to them, and in alignment with the respondent\u2019s priorities. * = team and non-team differ (t-test, p<0.05). Questions are given in Appendix A\n\n| Question | Likert answers (mean) | Text answers (count) |\n|---------------------------------|-----------------------|----------------------|\n| | overall | Team | Non-Team | Team | Non-Team |\n| Q1 influence (1-5 scale) | 2.8 | 3.2 | 2.3* | 10 | 6 |\n| Q2 helpful (1-5 scale) | 4.1 | 4.2 | 4.0 | 11 | 7 |\n| Q3 priorities (1-5 scale) | 3.5 | 3.7 | 3.1* | 11 | 3 |\n| Q4 improve (text) | - | - | - | 15 | 4 |\n| Q5 years (numeric) | 3.7 | 3.8 | 3.5 | - | - |\n| Q6 team (yes/no) | - | 24 yes | 15 no | - | - |\n\n5.1 Functions of the Roadmap\n\nBuilding and using the roadmap appeared to serve neither the extreme of forcing team members\u2019 agenda on a wider community, nor letting the broader user community choose a direction. Rather it allowed team members and others to identify areas of consensus around project goals, and keep focus on those goals through the year.\n\n5.1.1 Reaching consensus of purpose among team members. The Rust team put out a call at the beginning of 2018 asking the community to submit \u201cblogposts reflecting on Rust in 2017 and proposing goals and directions for Rust in 2018\u201d. An analysis of those posts and the eventual 2018 roadmap suggests that the Rust team indeed succeeded at soliciting input from people outside their team structure: only 18 of the 97 retrievable blog posts collected were authored by people listed as team members or alumni.\n\nHowever, the blog posts responding to the solicitation did not seem to be a major source of novel ideas from outside the central community; the resulting roadmap document was a synthesis of shared ideas from many sources. Most (23 of 30) of the roadmap topics we could find in the blog posts were mentioned by both team and non team blog posts. Only three topics were mentioned only by team members, and four only by non team members. No single blog post contained more than 12 of the topics, suggesting that the roadmap really is a synthesis of many perspectives, not\nTable 5. This table quantifies two types of effort (discussion and code contribution) applied by the Rust community, broken down by roadmap-relatedness and type of effort. The \u201ctotal\u201d figures show most discussion and coding was about non-roadmap items; however the Bytes per issue and lines per PR figures show that there was more effort per item about roadmap items.\n\n| | Total issue text | \u00f7 # issues = Bytes per issue | Total lines of code | \u00f7 # PRs = lines per PR |\n|------------------|------------------|------------------------------|---------------------|------------------------|\n| Roadmap | 31.6 MB | \u00f7 2899 = 10915.0 | 246K | \u00f7 680 = 362.2 |\n| Non Roadmap | 78.8 MB | \u00f7 9092 = 8662.2 | 923 K | \u00f7 3320 = 277.9 |\n\nsimply a codification of an existing consensus. Nor did the RFC-style process for accepting the roadmap after the core team had created it elicit completely new ideas from the community; rather discussion consisted mostly of clarification and acceptance. The roadmap changed little from when the core team proposed it on Jan 29, 2018, and its adoption on March 5th. Discussion (51 general comments and 20 comments linked to lines in the document) led to little change during that time. Besides typos, formatting, and clarifications, the main substantive change was a rewording to more strongly emphasize compiler performance. In short, the process did not appear to generate innovative new directions, but rather a consolidation of ideas that already had support but had not previously been gathered together.\n\n5.1.2 Focusing work during the year. Analysis of effort expended by the Rust community during 2018 demonstrates that the 2018 roadmap was neither followed religiously nor ignored completely. Rather it represented a community focus, in the sense that its initiatives attracted proportionally more coding and discussion per issue than issues not on the roadmap. Table 5 quantifies two types of effort applied by the Rust community, broken down by type of effort (contributing to discussion in GitHub threads, or writing code).\n\nFig. 3. Volume of discussion (left) and coding (right), broken down by team (n=191)/non-team (n=2392) members, and by roadmap/ non-roadmap issues. Left figure measures discussion in megabytes; right figure measures lines of code in pull requests in thousands of lines of code. Non-roadmap matters dominated in volume, for both discussion and code. Non-team members did most discussion; team members did somewhat more coding overall. Note that team members do more work per person, but there are vastly more non-team members; and that roadmap issues involve more work per item than non-roadmap issues (see Table 5).\n\nRoadmap matters constitute a minority of the work, but receive outsize attention. In the Rust compiler project\u2019s issue and PR threads, the Rust community generated 121,457 comments across\n11,991 different discussion threads during 2018 discussing proposed and ongoing development on the Rust compiler. The hottest threads (i.e. Rust project issues or pull requests with the most bytes of discussion) were more likely to be roadmap topics \u2013 6 of the top 10 largest issue threads were roadmap topics, but overall only 2899 out of 11991 (24%) of issues related directly to the roadmap, as measured by our heuristic ($\\chi^2 = 6.989$, p=.0082). In other words, roadmap topics were a community focus, but the long tail of smaller efforts actually constituted most of the discussion. Discussion of roadmap-related issues constituted on average 27.8% more text per issue than non-roadmap issues. Roadmap issues included more text than non-roadmap issues (p=.0092, 2-tailed t-test of log-transformed byte counts of issue discussions).\n\nAlthough 21.1% of the lines of code added and deleted were roadmap-related, the same focus relationship applied; only about 17.0% of PRs worked on were associated with the roadmap, but these roadmap PRs were more substantial changes, averaging 30.4% more lines of code per PR (p<.0001, 2-tailed t-test of log-transformed lines of code per pull request).\n\nThus although the majority of issues discussed and code changes proposed are not envisioned in the roadmap, the ones that are in the roadmap consume proportionally more effort per issue, especially from frequent contributors. The roadmap appears to serve as a focus of attention while still allowing for a great deal of work outside its boundaries. Not everything the community agrees on requires consensus-building or needs to be in the roadmap; some priorities, such as bug fixing, are obvious.\n\nWhen asked whether they followed the roadmap personally, twelve of the interviewees (PS002, PS006, PS007, PS008, PS016, PS018, PS020, PS021, PS023, PS027, PS036, PS039) replied that Rust roadmaps set a common direction for the community. Some emphasized common focus (\u201cI think they give a clear focus point for the year, what the community wants to work on next, (...) see if we accomplished our goals and what our next ones can or should be\u201d \u2013PS008, email interview), while others emphasized an open, non-prescriptive attitude (\u201cOstensibly, they should not be called roadmaps, but they are helpful in the sense that they set *general* priorities. Of course, a lot of other things outside the roadmap will be worked on as you cannot command volunteers to do otherwise\u201d \u2013PS016, email interview). Another said: \u201cRoadmaps are independent of the actual work that we can invest, so they can only ever guide\u201d (PS021, email interview).\n\n5.1.3 Prioritizing work for the core. Team members pay more heed to roadmap priorities than non-team members do. Although the roadmap is pitched as a description of general community priorities, there is evidence that some people, both team and non-team, perceive the roadmap as especially relevant to the activities of team developers, and less important or binding for non-team participants. Four of the 16 people who answered our interview question about how roadmaps influenced their decisions about what to work on indicated that the roadmap applied most to highly-involved people. One respondent, who claimed a fairly low (2/5) influence from the roadmap, said: \u201cI started contributing for my own learning and experience, roadmaps didn\u2019t influence me to start contributing but do influence what I contribute now that I\u2019m more involved\u201d (PS023, email interview). Another, who claimed high influence (5/5) from the roadmap, said, \u201cI\u2019m on the core team and work on subteams so the roadmap is directly related to the work I do\u201d (PS039, email interview).\n\nThe amounts of text and code generated by participants support the idea that team members were more likely to pay attention to roadmap issues: 87 out of the 108 team members who contributed code in 2018 (81%) added a comment to at least one roadmap-related issue, while only 39% of non-team contributors did so (1065 out of 2757); this difference in proportions was significant ($\\chi^2 = 75.98$, p<.00001). Still, the bulk of the work they did, regardless of role, was on non-roadmap matters. 34.1% of the text team members wrote in issue comments was in roadmap-related issues.\n(6.9MB out of 20.3MB in Figure 3), and 21.7% of the lines of code they wrote were in roadmap-related PRs. Contributors not in teams had a similar proportion of roadmap work, with 27.4% of issue comment text and 20.0% of code lines written being roadmap-relevant. It seems that teams\u2019 proportionally greater preference to work on roadmap issues at the individual issue level does not result in a vastly greater proportion of roadmap work done, by volume; this might be explained, for example, by team members \u201ctouching\u201d many issues in which they do not do the bulk of the work.\n\nAlthough some developers have very particular issues that they prefer to work on, others, especially team members, took cues from the roadmap when setting their own priorities. In the interviews, people gave equivocal answers to the question of whether Rust roadmaps influence their decision of what work to contribute: the average choice was 2.8 on a 1-5 scale, slightly closer to \u201cnot at all\u201d than the scale\u2019s midpoint of 3. People who said they were on a team rated this higher (3.2) than non-team respondents (2.3) (t-test, p<.01). Four of the people who elaborated on this question said that they felt the roadmap was mostly relevant to important issues addressed by team developers. Two specifically indicated that the presence of a feature in the roadmap gave developers confidence to work on that feature, knowing that some change they wanted to work on would be taken up by others in the community. One said they \u201conly contribute drive-by [i.e. as a one-off edit without much community engagement] when an itch needs scratching; roadmaps do influence where I see a chance of scratching actually result in usable changes to the language\u201d (PS001, email interview). In short, the roadmap provides encouragement to work on certain issues, for certain people, but most developers do not feel constrained to work on roadmap initiatives.\n\nInfluence between individual priorities and the roadmap ran both ways among interviewees. People rated agreement with the roadmap\u2019s priorities slightly positively: an average of 3.5 on the 1-5 scale, with team members significantly higher at 3.7 than non-team members at 3.1 (t-test, p<.05). Out of 14 who chose to elaborate, causality ran both ways: two said their priorities matched the roadmap\u2019s because they helped write it, and three said they just happened to agree with its priorities; on the other hand five said they pursued roadmap initiatives because they didn\u2019t have their own priorities, and three said they disagreed with the priorities but valued the importance of having a shared goal more than getting their own way. One person said the roadmap priorities were too vague to resolve the disagreements that were relevant in their working team.\n\n5.1.4 Creating external visibility. Some saw the roadmap as also serving to communicate the intentions of the Rust community to those outside the community, to make the community\u2019s trajectory more predictable. When first proposing the roadmap process, the author of the proposal listed among its goals \u201cAdvertise our goals as a published roadmap.\u201d and \u201cCelebrate our achievements with an informative publicity-bomb\u201d [1].\n\nIn our interviews, four of the 14 people (PS001, PS005, PS006, PS038) who answered our question about why roadmaps are valuable indicated that they helped the project communicate its vision and intentions outside the project. One said the roadmap helps users plan by giving them \u201c(...) a sense of which unstable features are OK to use in a project that\u2019s planning to switch to stable in a reasonable time frame\u201d (PS001, email interview). Another respondent found them helpful as a way to judge their own plans to use the language: \u201cI consider Rust to still be a young language that is not yet finalized, depending on the direction it goes it could be a deal-breaker for me\u201d (PS038, email interview).\n\n5.1.5 Building a sense of group identity. In online team meetings on Zulip, the largest number of roadmap mentions concerned creating roadmaps. The majority of mentions (63%, 91/144) were by a single participant, P008, a core team member who championed both the roadmap and the formation and strengthening of Rust\u2019s team structure in 2018. P008\u2019s rhetorical use of the roadmap included emphasizing the need to start a separate roadmap, e.g. for a subproject, and suggesting\nand collecting roadmap topics for existing roadmaps. P008 emphasized benefits of having roadmaps, such as successful collaborative work (\u201ca key step in any successful WG is going to be forming a **roadmap**\u201d \u2013P008, core team member, online meeting), structuring work processes (\u201cI think encouraging people to outline a roadmap with specific steps is a good idea\u201d \u2013P008, core team member, online meeting), or reaching bigger and shared goals. They argued, for example, that creating roadmaps is worth the effort put into it (\u201cit\u2019s worth taking the time to make the roadmap\u201d \u2013P008, core team member, online meeting) and that work time is needed to create roadmaps.\n\nA few non team members also mentioned a need for roadmaps to organize work effort (\u201cWe need to open issues first, and to have some kind of roadmap\u201d \u2013P040, non team member, online meeting) but were overall less committed to making decisions of how to create and manage roadmaps (\u201cnot sure if we want to wait and collect all the appropriate tool/subteam roadmaps and publish one collectively?\u201d \u2013P038, non team member, online meeting). In online meetings, non team members rather make comments to show mostly strong support for roadmap creation in reaction to suggestions made by core team members (\u201cI think a roadmap is definitely a good idea, something to get working groups working towards a goal could be helpful in keeping them active\u201d \u2013P045, non team member, online meeting) or praise the effort made by team members to create and apply roadmaps (\u201cI applaud all this, can\u2019t agree more on everything :)\u201d \u2013P048, non team member, online meeting).\n\nTeam members understood roadmaps as a useful planning tool for ongoing and future work and to manage working groups and attract more contributors by presenting work areas and goals. Roadmaps functioned to manifest topics working groups should focus on over a certain time which is why team members gently pushed towards creating roadmaps, for example by suggesting a new group begin with a very lightweight alternative to the complex community-wide process: (\u201cI\u2019m not imagining very long \u2018roadmaps\u2019, just some bullets\u201d \u2013P008, core team member, online meeting). The team members\u2019 effort to have contributors and working groups start roadmaps illustrates the need and the goal to organize and manifest work in written form and how especially the core team tries to manage larger and general goals for the distributed Rust community.\n\n5.1.6 Summary. The Rust community\u2019s team members began with a diverse set of priorities as individuals: the roadmap process was a way for team members to decide on a consensus focus of attention, and commit to applying themselves to those things during the year. It gave them a way to define themselves more strongly as a group by knowing that they had a shared purpose, and there is some evidence that it gave peripheral participants a way to assert their identity with a group, and for in-group members to gently channel outside contributions away from distracting alternate paths. Although the process explicitly listened to input from outside the community of Rust team membership, it did not in practice bring significant new ideas from outsiders into the conversation.\n\n5.2 Mechanisms of the Roadmap\n\nA roadmap written and never referred to again might simply gather dust and bear no relation to subsequent activity. The Rust community however appears to take the roadmap seriously after it is written. Individuals used it to gauge whether their own ideas are likely to be supported by others, to strengthen formation of teams, to discuss and argue with each other to encourage or discourage proposed efforts, and to reflect on progress.\n\n5.2.1 Assembling work groups. Although the roadmap, during its creation phase, helps the whole community build consensus about its overall goals, developers also use it to find each other and form collaborations to do more particular tasks.\n\nIn blog posts, team and non team members alike mentioned personal or project roadmaps as a way to inform each other about work activities and promote plans of action. For example, they\nreferred to detailed goals in project roadmaps (\"There\u2019s a bit more detail on the project roadmap\" \u2013P091, non team member, blog post) or pointed out roadmap goals for work groups (\"Embedded is one of the four target domains in the Rust 2018 Roadmap (...)\" \u2013P084, non team member, blog post).\n\nIn one issue comment, a contributor motivated others to contribute ideas to the roadmap call for blog posts to influence the Rust roadmap (\"Please write a Rust 2019 blog post and express this concern. I think if enough of us do that, we can influence the roadmap\" \u2013P021, team member, issue comment).\n\nCore team members in early 2018 pushed for creation of formal working groups for domains that were defined as focus in the roadmap. In blog posts, team members emphasized that work effort would be aimed at domain working groups (\"the primary focus of this year\u2019s project is (...) the domain working groups that we kicked off with our 2018 Roadmap\" \u2013P076, core team member, blog post) and team leaders advertised to the community to allocate their resources to domain working groups. Blog posts at the time announced new working groups for a domain or argued for reorganizing existing working groups to better meet roadmap goals (\"The dev-tools team should be reorganised to continue to scale and to support the goals in this roadmap\" \u2013P077, team member, blog post).\n\nConversely, although the roadmaps are not promoted as being a complete list of things to work on, they also serve to pre-warn developers that some things they might work on would not likely attract much support or collaboration. In some RFC, issue, and PR comments, team members used the roadmap to refer to the overall direction Rust should take. Even without definite future goals, the mere existence of a roadmap process served to reject proposals not matching potential goals. This included explanations such as, it is not the right time, not the right trend (\"While the details of roadmap is still in play, (...) this seems like a clear expansion with insufficiently strong motivation\" \u2013P008, core team member, RFC comment), or not the right perspective (\"I don\u2019t think that major rework of enums currently aligns well with our current priorities or those priorities we are likely to set in the upcoming roadmap\" \u2013P008, core team member, RFC comment).\n\n5.2.2 Discouraging non-roadmap RFCs and basis for rejecting proposals. Team membership appears to affect how people talk about the roadmap. Roadmap mentions by team members in RFC, issue, and PR comments intended to point contributors to roadmap topics and away from the RFC proposal (\"I\u2019d like to draw attention to our 2018 roadmap\" \u2013P012, core team member, RFC comment). However, team members often still valued developers\u2019 ideas and motivated future work. For example, they presented the prospect that a feature could make it on the upcoming roadmap (\"could be an interesting thing to consider for next year\u2019s roadmap\" \u2013P002, team member, RFC comment).\n\nThe roadmap gave a justification for team members and especially for core team members to dismiss proposals that did not fit well with the community\u2019s vision for Rust, or that would take too much significant effort away from current efforts. In comments on GitHub, the roadmap was mostly mentioned as an argument in discussions for team members to decline proposed RFCs when they did not seem to fit roadmap goals (\"it\u2019s not the kind of change that\u2019s targeted for the roadmap this year\" \u2013P002, team member, RFC comment). This argumentative strategy seems to go against the perception of the roadmap as a mere guideline, instead posing roadmap goals as delimiting boundaries to which work and effort should be allocated. Only some comments gave additional explanations for declining such RFCs in relation to the roadmap. For example, the roadmap was treated as a strict work plan when proposals are a possible threat to achieving roadmap goals (\"I am pretty worried if we delay now we will have a hard time delivering on our roadmap for the year\" \u2013P007, team member, issue comment). Team members also used the roadmap to reinforce something perceived to be a true but insufficient reason to end RFC or issue discussions, for example, when a proposal did not generate enough community interest (\"There hasn\u2019t been a lot of activity on this RFC (...) it also doesn\u2019t particularly fit the roadmap\" \u2013P008, core team member, RFC comment).\nalso defined adequacy of RFC discussions against the roadmap goals (\"I also don\u2019t think this RFC is of high enough priority to the Rust roadmap to devote a lot of attention to reaching consensus\" \u2013P018, core team member, RFC comment). In other words, features that did not match the roadmap were not worth the effort to find consensus within the community.\n\nAlthough non-team members rarely used the roadmap to argue against features, one contributor mentioned the roadmap to speak out against an issue (\"Finally, \u2018abstract type\u2019s are not close on the roadmap\" \u2013P011, non team member, RFC comment). Beyond its role then in consolidating a consensus when it was created, the roadmap also is used as an argumentative resource for encouraging work on shared goals, and discouraging work (and even extended discussion of work) that risks becoming a distraction.\n\n5.2.3 Reason to promote particular issues and PRs. We found in issue and PR comments non team members mostly mentioned the roadmap by referring to, supporting, or emphasizing roadmap goals in issue discussions or when asking about clarification or the status of roadmap goals. They often argued in favor of features that were on or related to the roadmap (\"Using build systems other than/in addition to Cargo is explicitly a goal in the 2018 roadmap\" \u2013P028, non team member, issue comment). They often mentioned the roadmap as a strong reference to argue for working on or implementing features, sometimes even with reference to previous roadmap topics (\"Cargo being able to integrate into larger build systems was I think on the 2017 roadmap\" \u2013P009, non team member, RFC comment). In discussing work effort in issues and PRs, non team members also pointed roadmap goals out to others (\"Note for those who haven\u2019t seen yet: macros 2.0 is apparently slated to be _stable_ later this year, according to the proposed roadmap\" \u2013P021, team member, issue comment).\n\n5.2.4 Shared basis for later reflection. The Rust roadmap process promises a retrospective reflection at the end of each year [1]. As part of that, the Rust core team asked people to reflect on 2018\u2019s roadmap when posing ideas for the 2019 Roadmap. The reflections within these posts mostly evaluated progress on the roadmap\u2019s particular initiatives. For example, posters praised progress on WebAssembly (\"2018 has been a really cool year for WASM and Rust\" \u2013P116, team member, blog post reflecting) or on futures and async/await (\"A lot of progress was made on Futures async/await in 2018\" \u2013P110, team member, blog post reflecting). People also criticized lack of progress in unfinished tooling (\"Tooling was a large part of the goal for Rust 2018. If one gets lucky, tooling around editor and IDE support can \u201cjust work\u201d, but many times it doesn\u2019t.\" \u2013P071, non team member, blog post reflecting) or missing libraries. Other posts commented on the features themselves, claiming that changes made had no actual benefit for the users or were mistimed.\n\nReflections about the process itself were relatively rare. Developers mentioned that community collaborative work processes had not yet improved as planned and that the community still needed to better manage exhaustion and time spent on topics in general (\"many of the key contributors to rustc (...) were put under an enormous amount of pressure to get their changes shipped by the deadline\" \u2013P086, non team member, blog post reflecting). Moving into 2019 as the efforts to reflecting on 2018 waned, blog posts mentioning roadmaps mostly highlighted work group achievements, such as developments in the Rust package manager, cargo; WebAssembly goals and stabilization; and the growth and increased productivity of Rust teams. This seems consistent with the 2019 roadmap\u2019s shift in emphasis towards team-specific roadmaps.\n\nIn our email interviews, 19 people (PS002, PS005, PS006, PS007, PS008, PS013, PS016, PS018, PS021, PS023, PS025, PS028, PS029, PS030, PS032, PS035, PS036, PS037, PS039) responded to our question about how roadmaps could be improved; all but two of these were people on teams. Most of the suggestions seemed aimed at reinforcing the roadmap\u2019s role as a commitment to achieve goals. The most common suggestion (7 respondents: PS006, PS007, PS008, PS028, PS030, PS032, PS035)\nwas better reflection about the process, in most cases at the end of the year during preparation of the next roadmap. One respondent said: \u201cIt\u2019d be nice to have a retrospective that examines how much work for the year kept to plan, and to give a summary of how the language advanced in the desired direction\u201d (PS007, email interview). Seven respondents were satisfied with the process (PS002, PS036, PS037) or said they had no opinion (PS005, PS013, PS023, PS029), but the rest had ideas for improvements. Other suggestions were: less ambitious goals, more specific/concrete goals, and better estimation of effort levels. Only two non-team members responded to this question; one of these called for more stakeholder involvement, saying: \u201cFiguring out low threshold way of bringing library stakeholders into the projects where minimal time commitment is paramount\u201d (PS018, email interview).\n\n5.2.5 Summary. The intention and process for creating a roadmap gave the community an opportunity and shared artifact around which to talk about and balance priorities, and define boundaries and shared purpose when forming teams. During the year it was in effect, community members used it in online discourse as justification for discouraging off-topic work, and as justification for encouraging on-topic work. It also tipped the balance for individual decisionmaking about work allocation by providing evidence that on-topic efforts would be supported by other community members. Afterwards it served as a standard against which to evaluate progress over the year.\n\n6 DISCUSSION\n\nRust\u2019s roadmap process strikes a balance between openness to new ideas and people, and unifying around common goals. As a popular programming language, there are many potential contributors who could be welcomed and encouraged to help; but as mentioned above in Subsection 2.1.3, eliciting help from the peripheries of a community requires a balance between welcoming openness, and predictable direction. Rust\u2019s process seems to strike that balance by creating some ceremony around the transition from openness to direction: they welcome input when building the roadmap, then visibly commit to one direction when the roadmap is released. Although few new ideas from outsiders appear to enter the roadmap through this process, they are enumerated, summarized, and listened to. The fact that new ideas from outsiders have a non-zero chance of being heeded may well be important for encouraging participation, just as the infinitesimal but non-zero chance of winning a lottery is effective in encouraging broad participation.\n\nAnother advantage of the transparent roadmap creation process is that it confers legitimacy on the governing process [31]. A document with no visible grounding in such a process might not be trusted as out of date, or as one individual\u2019s interpretation of the community\u2019s goals, or even as intentions of a sponsoring organization like Mozilla. In contrast, by offering prospective contributors the ability to gain knowledge and trust of a community\u2019s true intentions, Rust might be allowing them to more quickly gain a sense of belongingness to the community, a well-studied motivator for contribution [38]. The fact that we observed non-team participants encouraging others to work on PRs relevant to the roadmap suggests that they may be visibly signalling their commitment to the community by demonstrating their familiarity with the roadmap.\n\nWhen individual contributors can trust that planned work will be done by others in a known timeframe, \u201cdivide and conquer\u201d approaches to coordination may become more viable. Howison and Crowston [39] found concurrent development of dependent contributions to be rare in open source. When studying how open source projects performed complex multi-person tasks, Howison and Crowston only observed developers either immediately adding contributions when the necessary supporting code was already in place, or deferring contributions in the hopes that someday that support would become available. They did not observe a pattern of multi-person interdependent work, in which one developer proceeded on a feature, trusting that another developer\nwould be writing supporting code at the same time. We hypothesize that such co-work may be more common in projects that provide some trustable signal about others\u2019 intentions. Searching for such examples in Rust would be fruitful future work.\n\n**Team members, particularly the core team itself, play an important role in curating suggestions and articulating a common vision.** The core team influences the consensus built and maintained by the roadmap process by:\n\n- Framing community survey questions and requests for pre-roadmap blog posts, then choosing among the answers to build a coherent set of initiatives.\n- Using their visibility and respect to argue for their vision publicly, in blog posts, RFC and issue discussions, forums, team meetings.\n- Holding voting privileges over RFCs and merge rights for PRs; as mentioned earlier, while most accepted RFCs do not align with the roadmap, the roadmap is sometimes used a way to frame rejection of RFCs, usually that are problematic for other reasons.\n- A roadmap allows core team members to take a role similar to a manager; this can be seen, for example, in P008\u2019s strategy in steering team and contributor effort by using the roadmap as an agreed upon validation.\n\n### 7 IMPLICATIONS FOR OTHER PROJECTS\n\nA case study is useful for providing a deep example of how a process has played out in the real world: as such it can provide experiences that other projects can learn from, but other projects considering roadmapping need to consider how it applies to their own context.\n\nA project may want to consider a roadmapping process if it is struggling to balance diverging priorities and wants to strengthen a sense of shared direction. Based on our observation of a single case, we suggest the following guidance:\n\n- Actively solicit input from the larger community of developers as well as the core team. As we saw in this case, the overlap in ideas can be very helpful in identifying areas of consensus that already exist, and in letting those harboring ideas lacking in consensus that there is unlikely to be significant effort, in the aggregate, applied to their ideas.\n- Adopt a non-zero number of ideas from the community. It seems likely that in order to keep the larger community engaged and interested, a few of the ideas from beyond the core team should make it into the roadmap.\n- The evaluation process should be open and fair. As with any form of governance, fairness and openness convey a sense of legitimacy around the decision-making and enhance the likelihood that the community will accept and act on the roadmap.\n- Don\u2019t expect all \u2013 or even most \u2013 of the development work and discussion to focus on roadmap items. Nevertheless, significant progress on these items can be made, especially by the most frequent contributors.\n- Reflecting on the community\u2019s progress against the roadmap and on the process by which the roadmap was constructed can be helpful in creating future versions.\n\nAs we caution in the next section, however, this paper describes Rust\u2019s experience building a roadmap process for its own particular needs. It is not clear how this process would need to be different for a community building different software, with different developers, for different users.\n\n### 8 THREATS TO VALIDITY\n\nOur results rely in part on detailed qualitative analysis. Qualitative studies mostly do not aim at generalizability but at providing \u201ca rich, contextualized understanding of human experience through the intensive study of particular cases\u201d [63]. We looked at the Rust community as a case\nstudy example on how OSS communities use roadmaps as organizational tools to manage and allocate work effort to shared work goals. Interviewees may not have been representative of the entire community; although our response rate was fairly high, there is a long tail of contributors, and there may be some self-selection bias especially among low-volume contributors.\n\nWe do not know how typical Rust is of OSS communities with regard to its roadmap, so we only speculate about how our findings might apply beyond Rust.\n\nWe identified a specific list of roadmap topics, and classified issues, PRs, and RFCs according to those topics using a heuristic, described in Appendix B, that may undercount what work is or is not from the roadmap. The boundaries of these topics are not well-defined, since features interact, and work on a non-roadmap feature may be needed where it interacts with a roadmap feature, or vice-versa. However we relied on titles and labels assigned by the community themselves, and our mapping from roadmap topics to labels in many cases had a great deal of face validity.\n\nWe do not attempt to tease out the effectiveness of roadmaps as a coordination mechanism, as compared to other ways of governing. Our focus was on understanding how this community constructed and used roadmaps. Future work could address questions of effectiveness by, for example, comparing quality, productivity, or community satisfaction before and after roadmap adoption.\n\n9 CONCLUSIONS\n\nIn this work we set out to understand the functions of roadmaps for the Rust community, and how they used it to fulfill those functions. To do this, we qualitatively examined the creation, management, and reflection on consensus through the roadmap process, and estimated the proportions of roadmap-related work done throughout the planned year.\n\nWe have shown that roadmap\u2019s purposes included building and legitimizing consensus, focusing and prioritizing collective attention, particularly for team members, building group identity, and creating external visibility for the community\u2019s plans.\n\nThe community accomplishes these purposes by assembling work groups around the roadmap\u2019s structure, using roadmap goals as justification for directing people towards roadmap-related work, and by using the roadmap to ground reflection at the end of the year when planning for the next year.\n\nThe power that the roadmap has to influence contributors\u2019 choices during the year comes from the fact that it comprises exactly those initiatives where collaborators are willing to help. Its transparent process provides evidence of that willingness to other developers who are deciding where to contribute their effort. During the roadmapped year, instead of strictly constraining activity, the roadmap rather functioned to nudge contributors to work on collectively agreed upon topics in case their focus would wander off to other, individually motivated, topics. In this way, the roadmap enables the community to guide itself to areas of mutual interest, rather than commanding effort on shared goals.\n\nIt thus guides the community, without the need to exert hierarchical power, and provides a useful prediction about future development for people working on dependent projects.\n\nREFERENCES\n\n[1] Brian Anderson. 2016. Feature: north-star. https://github.com/brson/rfcs/blob/north-star/text/0000-north-star.md Last accessed 13 January 2020.\n\n[2] John Anvik, Lyndon Hiew, and Gail C Murphy. 2006. Who Should Fix This Bug?. In Proc. International Conference on Software Engineering (Shanghai, China) (ICSE \u201906). ACM, New York, NY, USA, 361\u2013370.\n\n[3] Open Service Broker API. 2019. Roadmap & Release Planning. https://github.com/openservicebrokerapi/servicebroker/projects/1 Last accessed 13 January 2020.\n[4] A Barcomb, A Kaufmann, D Riehle, K Stol, and B Fitzgerald. 2018. Uncovering the Periphery: A Qualitative Survey of Episodic Volunteering in Free/Libre and Open Source Software Communities. *IEEE Trans. Software Eng.* (2018), 1\u20131.\n\n[5] Hoda Baytiyeh and Jay Pfaffman. 2010. Open source software: A community of altruists. *Comput. Human Behav.* 26, 6 (Nov. 2010), 1345\u20131354.\n\n[6] Stefan Kambiz Behfar, Ekaterina Turkina, and Thierry Burger-Helmchen. 2018. Knowledge management in OSS communities: Relationship between dense and sparse network structures. *Int. J. Inf. Manage.* 38, 1 (Feb. 2018), 167\u2013174.\n\n[7] Willem Bekkers, Inge van de Weerd, Marco Spruit, and Sjaak Brinkkemper. 2010. A Framework for Process Improvement in Software Product Management. In *Systems, Software and Services Process Improvement*. Springer Berlin Heidelberg, 1\u201312.\n\n[8] Mariette Bengtsson. 2016. How to plan and perform a qualitative study using content analysis. *NursingPlus Open* 2 (2016), 8\u201314.\n\n[9] Yochai Benkler. 2002. Coase\u2019s Penguin, or, Linux and \u201cThe Nature of the Firm\u201d. *Yale Law J.* (2002), 369\u2013446.\n\n[10] Bruce Lawrence Berg, Howard Lune, and Howard Lune. 2004. Qualitative research methods for the social sciences. Vol. 5. Pearson Boston, MA.\n\n[11] Matthew J Bietz, Eric P S Baumer, and Charlotte P Lee. 2010. Synergizing in Cyberinfrastructure Development. *Comput. Support. Coop. Work* 19, 3-4 (July 2010), 245\u2013281.\n\n[12] Christopher Bogart, Christian K\u00e4stner, James Herbsleb, and Ferdian Thung. 2016. How to Break an API: Cost Negotiation and Community Values in Three Software Ecosystems. In *Proc. International Symposium on Foundations of Software Engineering* (Seattle, WA, USA) (FSE 2016). ACM, New York, NY, USA, 109\u2013120.\n\n[13] Yuanfeng Cai and Dan Zhu. 2016. Reputation in an open source software community: Antecedents and impacts. *Decis. Support Syst.* 91 (Nov. 2016), 103\u2013112.\n\n[14] AWS Cloudformation. 2018. CloudFormation Public Coverage Roadmap. https://github.com/aws-cloudformation/aws-cloudformation-coverage-roadmap Last accessed 13 January 2020.\n\n[15] J Coelho, M T Valente, L L Silva, and A Hora. 2018. Why We Engage in FLOSS: Answers from Core Developers. In *Intl. Workshop on Cooperative and Human Aspects of Software Engineering* (CHASE). 114\u2013121.\n\n[16] John W Creswell and Vicki L Plano Clark. 2017. Designing and conducting mixed methods research. Sage publications.\n\n[17] John W Creswell and Cheryl N Poth. 2016. Qualitative inquiry and research design: Choosing among five approaches. Sage publications.\n\n[18] Kevin Crowston and Ivan Shamshurin. 2016. Core-Periphery Communication and the success of free/libre open source software projects. *IFIP Advances in Information and Communication Technology* 472 (2016), 45\u201356. https://doi.org/10.1007/978-3-319-39225-7\n\n[19] Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social Coding in GitHub: Transparency and Collaboration in an Open Software Repository. In *Proc. Conference on Computer Supported Cooperative Work* (Seattle, Washington, USA) (CSCW \u201912). ACM, New York, NY, USA, 1277\u20131286.\n\n[20] Carlo Daffara. 2012. Estimating the economic contribution of open source software to the European economy. In *The First Openforum Academy Conference Proceedings*. books.google.com.\n\n[21] Jean-Michel Dalle, Paul A David, and Others. 2003. The allocation of software development resources in \u2018open source\u2019 production mode. *SIEPR-Project NOSTRA Working Paper* (15th February) [Accepted for publication in Joe Feller, Brian Fitzgerald, Scott Hissam, Karim Lakhani, eds., *Making Sense of the Bazaar*, forthcoming from MIT Press in 2004] (2003).\n\n[22] Premkumar Devanbu, Pallavi Kudigrama, Cindy Rubio-Gonz\u00e1lez, and Bogdan Vasilescu. 2017. Timezone and Time-of-day Variance in GitHub Teams: An Empirical Method and Study. In *Proc. International Workshop on Software Analytics* (Paderborn, Germany) (SWAN 2017). ACM, New York, NY, USA, 19\u201322.\n\n[23] Zakir Durumeric, Frank Li, James Kasten, Johanna Amann, Jethro Beekman, Mathias Payer, Nicolas Weaver, David Adrian, Vern Paxson, Michael Bailey, and J. Alex Halderman. 2014. The Matter of Heartbleed. In *Proc. Internet Measurement Conference* (Vancouver, BC, Canada) (IMC \u201914). Association for Computing Machinery, New York, NY, USA, 475\u2013488. https://doi.org/10.1145/2663716.2663755\n\n[24] Christof Ebert. 2007. The impacts of software product management. *J. Syst. Softw.* 80, 6 (June 2007), 850\u2013861.\n\n[25] Christof Ebert and Sjaak Brinkkemper. 2014. Software product management\u2013An industry evaluation. *J. Syst. Softw.* 95 (2014), 10\u201318.\n\n[26] Nadia Eghbal. 2016. Roads and Bridges: The unseen labor behind our digital infrastructure. Technical Report. Ford Foundation.\n\n[27] Anna Filippova and Hichang Cho. 2016. The Effects and Antecedents of Conflict in Free and Open Source Software Development. *Proc. Conf. on Computer Supported Cooperative Work & Social Computing* (CSCW) (2016), 705\u2013716.\n\n[28] Brian Fitzgerald. 2006. The Transformation of Open Source Software. *MIS Quarterly* 30, 3 (2006), 587\u2013598.\n\n[29] Uwe Flick. 2018. *An introduction to qualitative research*. Sage Publications Limited.\n[30] Samuel A Fricker. 2012. Software product management. In Software for People. Springer, 53\u201381.\n\n[31] Archon Fung. 2006. Varieties of Participation in Complex Governance. Public Administration Review 66, s1 (2006), 66\u201375.\n\n[32] Michael J. Gallivan. 2001. Striking a balance between trust and control in a virtual organization: A content analysis of open source software case studies. Information Systems Journal 11, 4 (2001), 277\u2013304. https://doi.org/10.1046/j.1365-2575.2001.00108.x\n\n[33] Mohammad Gharehyazie, Daryl Posnett, Bogdan Vasilescu, and Vladimir Filkov. 2015. Developer initiation and social interactions in OSS: A case study of the Apache Software Foundation. Empirical Software Engineering 20, 5 (Oct. 2015), 1318\u20131353.\n\n[34] Shane Greenstein and Frank Nagle. 2014. Digital dark matter and the economic contribution of Apache. Research Policy 43, 4 (May 2014), 623\u2013631.\n\n[35] Gordon Haff. 2018. How Open Source Ate Software: Understand the Open Source Movement and So Much More. Apress.\n\n[36] A. Hars and Shaosong Ou. 2001. Working for free? Motivations of participating in open source projects. In Proc. Hawaii International Conference on System Sciences. 9 pp.\u2013.\n\n[37] Andrea Hemetsberger and Christian Reinhardt. 2009. Collective development in open-source communities: An activity theoretical perspective on successful online collaboration. Organization Studies 30, 9 (2009), 987\u20131008. https://doi.org/10.1177/0170840609339241\n\n[38] Guido Hertel, Sven Niedner, and Stefanie Herrmann. 2003. Motivation of software developers in Open Source projects: an Internet-based survey of contributors to the Linux kernel. Research Policy 32, 7 (July 2003), 1159\u20131177.\n\n[39] James Howison and Kevin Crowston. 2014. Collaboration through open superposition: a theory of the open source way. Miss. Q. 38, 1 (2014), 29\u201350.\n\n[40] Chris Jensen and Walt Scacchi. 2010. Governance in open source software development projects: A comparative multi-level analysis. In IFIP International Conference on Open Source Systems. Springer, 130\u2013142.\n\n[41] Hans-Bernd Kittlaus and Samuel A Fricker. 2017. Software Product Management: The ISPMA-Compliant Study Guide and Handbook. Springer.\n\n[42] Florian Kohlbacher. 2006. The use of qualitative content analysis in case study research. In Forum Qualitative Sozialforschung/Forum: Qualitative Social Research, Vol. 7. Institut f\u00fcr Qualitative Forschung, 1\u201330.\n\n[43] Klaus Krippendorff. 2018. Content analysis: An introduction to its methodology. Sage publications.\n\n[44] Sandeep Krishnamurthy, Shaosong Ou, and Arvind K Tripathi. 2014. Acceptance of monetary rewards in open source software development. Research Policy 43, 4 (2014), 632\u2013644.\n\n[45] K Lakhani. 2005. Why Hackers Do What They Do: Understanding Motivation and Effort in Free/Open Source Software Projects. Perspectives on Free and Open Source Software (2005), 3\u201321.\n\n[46] Charlotte P Lee, Paul Dourish, and Gloria Mark. 2006. The human infrastructure of cyberinfrastructure. Comput. Support. Coop. Work (2006), 483\u2013492.\n\n[47] Jung Hoon Lee, Hyung-Il Kim, and Robert Phaal. 2012. An analysis of factors improving technology roadmap credibility: A communications theory assessment of roadmapping processes. Technol. Forecast. Soc. Change 79, 2 (Feb. 2012), 263\u2013280.\n\n[48] M M Lehman, J F Ramil, P D Wernick, D E Perry, and W M Turski. 1997. Metrics and laws of software evolution-the nineties view. In Proceedings Fourth International Software Metrics Symposium. IEEE, 20\u201332.\n\n[49] Andrey Maglyas, Uolevi Nikula, and Kari Smolander. 2013. What are the roles of software product managers? An empirical investigation. J. Syst. Softw. 86, 12 (Dec. 2013), 3071\u20133090.\n\n[50] M Lynne Markus. 2007. The governance of free/open source software projects: Monolithic, multidimensional, or configurational? Journal of Management and Governance 11, 2 (2007), 151\u2013163.\n\n[51] Niko Matsakis. 2015. Priorities after 1.0. https://internals.rust-lang.org/t/priorities-after-1-0/1901 Last accessed 13 January 2020.\n\n[52] Philipp Mayring. 2004. Qualitative content analysis. A companion to qualitative research 1 (2004), 159\u2013176.\n\n[53] Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. Reliability and inter-rater reliability in qualitative research: Norms and guidelines for CSCW and HCI practice. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1\u201323.\n\n[54] Rebeca M\u00e9ndez-Dur\u00f3n. 2013. Do the allocation and quality of intellectual assets affect the reputation of open source software projects? Information & Management 50, 7 (Nov. 2013), 357\u2013368.\n\n[55] Martin Michlmayr, Francis Hunt, and David Probert. 2007. Release management in free software projects: Practices and problems. IFIP Int. Fed. Inf. Process. 234, December 2006 (2007), 295\u2013300.\n\n[56] A Mockus, D M Weiss, and Ping Zhang. 2003. Understanding and predicting effort in software projects. In 25th International Conference on Software Engineering. 2003. Proceedings. IEEE, 274\u2013284.\n[57] J\u00fcrgen M\u00fcnch, Stefan Trieflinger, and Dominic Lang. 2019. Product roadmap\u2013from vision to reality: a systematic literature review. In 2019 IEEE International Conference on Engineering, Technology and Innovation (ICE/ITMC). IEEE, 1\u20138.\n\n[58] Siobh\u00e1n O\u2019Mahony and Beth A Bechky. 2008. Boundary organizations: Enabling collaboration among unexpected allies. Administrative science quarterly 53, 3 (2008), 422\u2013459.\n\n[59] Stack Overflow. 2019. Most Loved, Dreaded, and Wanted Languages. https://insights.stackoverflow.com/survey/2019#technology-_-most-loved-dreaded-and-wanted-languages Last accessed 13 January 2020.\n\n[60] Gang Peng, Yun Wan, and Peter Woodlock. 2013. Network ties and the success of open source software development. The Journal of Strategic Information Systems 22, 4 (Dec. 2013), 269\u2013281.\n\n[61] Robert Phaal and Gerrit Muller. 2009. An architectural framework for roadmapping: Towards visual strategy. Technol. Forecast. Soc. Change 76, 1 (Jan. 2009), 39\u201349.\n\n[62] Gustavo Pinto, Luiz Felipe Dias, and Igor Steinmacher. 2018. Who Gets a Patch Accepted First?: Comparing the Contributions of Employees and Volunteers. In Proceedings of the 11th International Workshop on Cooperative and Human Aspects of Software Engineering (Gothenburg, Sweden) (CHASE \u201918). ACM, New York, NY, USA, 110\u2013113.\n\n[63] Denise F Polit and Cheryl Tatano Beck. 2010. Generalization in quantitative and qualitative research: Myths and strategies. International journal of nursing studies 47, 11 (2010), 1451\u20131458.\n\n[64] Germ\u00e1n Poo-Caama\u00f1o, Eric Knauss, Leif Singer, and Daniel M German. 2017. Herding cats in a FOSS ecosystem: a tale of communication and coordination for release management. Journal of Internet Services and Applications 8, 1 (2017).\n\n[65] Germ\u00e1n Poo-Caama\u00f1o, Leif Singer, Eric Knauss, and Daniel M German. 2016. Herding cats: A case study of release management in an open collaboration ecosystem. IFIP Adv. Inf. Commun. Technol. 472 (2016), 147\u2013162.\n\n[66] Huilian Sophie Qiu, Alexander Nolte, Anita Brown, Alexander Serebrenik, and Bogdan Vasilescu. 2019. Going Farther Together: The Impact of Social Capital on Sustained Participation in Open Source.\n\n[67] Hector Ramos. 2018. Open Source Roadmap. https://facebook.github.io/react-native/blog/2018/11/01/oss-roadmap Last accessed 13 January 2020.\n\n[68] David Ribes and Thomas A Finholt. 2009. The long now of infrastructure: Articulating tensions in development. Journal of the Association for Information Systems (JAIS) (2009).\n\n[69] Rust. 2019. Governance. https://www.rust-lang.org/governance Last accessed 13 January 2020.\n\n[70] Rust. 2019. Production users. https://www.rust-lang.org/production/users Last accessed 13 January 2020.\n\n[71] Read Rust. 2018. Rust 2018: Hopes and dreams for Rust in 2018. https://readrust.net/rust-2018 Last accessed 13 January 2020.\n\n[72] Read Rust. 2019. Rust 2019: Ideas from the community for Rust in 2019, and the next edition. https://readrust.net/rust-2019 Last accessed 13 January 2020.\n\n[73] W Scacchi. 2002. Understanding the requirements for developing open source software systems. IEEE Proceedings - Software 149, 1 (Feb. 2002), 24\u201339.\n\n[74] Sonali K Shah. 2006. Motivation, Governance, and the Viability of Hybrid Forms in Open Source Software Development. Manage. Sci. 52, 7 (July 2006), 1000\u20131014.\n\n[75] Maha Shaikh and Ola Henfridsson. 2017. Governing open source software through coordination processes. Information and Organization 27, 2 (2017), 116\u2013135.\n\n[76] Cuihua Shen and Peter Monge. 2011. Who connects with whom? A social network analysis of an online open source software community. First Monday 16, 6 (June 2011).\n\n[77] Param Vir Singh, Yong Tan, and Vijay Mookerjee. 2011. Network Effects: The Influence of Structural Capital on Open Source Project Success. MIS Quarterly 35, 4 (2011), 813\u2013829.\n\n[78] Matthias St\u00fcrmer. 2013. Four types of open source communities. https://opensource.com/business/13/6/four-types-organizational-structures-within-open-source-communities. Accessed: 2020-1-5.\n\n[79] Tanja Suomalainen, Outi Salo, Pekka Abrahamsson, and Jouni Simil\u00e4. 2011. Software product roadmapping in a volatile business environment. Journal of Systems and Software 84, 6, 958\u2013975.\n\n[80] Yong Tan, Vijay Mookerjee, and Param Singh. 2007. Social capital, structural holes and team composition: Collaborative networks of the open source software community. Proc. International Conference on Information Systems (2007), 155.\n\n[81] Antony Tang, Taco de Boer, and Hans van Vliet. 2011. Building roadmaps: a knowledge sharing perspective. In Proc. International Workshop on SHAring and Reusing Architectural Knowledge. 13\u201320.\n\n[82] Niels C Taubert. 2008. Balancing requirements of decision and action: Decision-making and implementation in free/open source software projects. Science, Technology & Innovation Studies 4, 1 (2008), 69\u201388.\n\n[83] Jonathan Taylor. 2017. Rust 2017 Survey Results. https://blog.rust-lang.org/2017/09/05/Rust-2017-Survey-Results.html Last accessed 13 January 2020.\n[84] Libra Engineering Team. 2019. Libra Core Roadmap #2. https://developers.libra.org/blog/2019/12/17/libra-core-roadmap-2 Last accessed 13 January 2020.\n\n[85] Scala Team. 2017. Scala 2.13 Roadmap. https://www.scala-lang.org/news/roadmap-2.13.html Last accessed 13 January 2020.\n\n[86] The Rust Core Team. 2018. A call for Rust 2019 Roadmap blog posts. https://blog.rust-lang.org/2018/12/06/call-for-rust-2019-roadmap-blogposts.html Last accessed 13 January 2020.\n\n[87] The Rust Core Team. 2018. New Year\u2019s Rust: A Call for Community Blogposts. https://blog.rust-lang.org/2018/01/03/new-years-rust-a-call-for-community-blogposts.html Last accessed 13 January 2020.\n\n[88] The Rust Core Team. 2018. Rust\u2019s 2018 roadmap. https://blog.rust-lang.org/2018/03/12/roadmap.html Last accessed 13 January 2020.\n\n[89] The Rust Core Team. 2019. Rust\u2019s 2019 Roadmap. https://blog.rust-lang.org/2019/04/23/roadmap.html Last accessed 13 January 2020.\n\n[90] The Rust Survey Team. 2018. Rust Survey 2018 Results. https://blog.rust-lang.org/2018/11/27/Rust-survey-2018.html Last accessed 13 January 2020.\n\n[91] Jonathan Turner. 2016. 2016 Rust Commercial User Survey Results. https://internals.rust-lang.org/t/2016-rust-commercial-user-survey-results/4317 Last accessed 13 January 2020.\n\n[92] Jonathan Turner. 2016. State of Rust Survey 2016. https://blog.rust-lang.org/2016/06/30/State-of-Rust-Survey-2016.html Last accessed 13 January 2020.\n\n[93] Aaron Turon. 2016. Refining Rust\u2019s RFCs. http://aturon.github.io/blog/2016/07/05/rfc-refinement/ Last accessed 13 January 2020.\n\n[94] Aaron Turon. 2017. Rust\u2019s 2017 Roadmap. https://blog.rust-lang.org/2017/02/06/roadmap.html Last accessed 13 January 2020.\n\n[95] Tuukka Turunen. 2018. QT Roadmap for 2018. https://www.qt.io/blog/2018/02/22/qt-roadmap-2018 Last accessed 13 January 2020.\n\n[96] I van de Weerd, S Brinkkemper, R Nieuwenhuis, J Versendaal, and L Bijlsma. 2006. Towards a Reference Framework for Software Product Management. In International Requirements Engineering Conference (RE\u201906). 319\u2013322.\n\n[97] Konstantin Vishnevskiy, Oleg Karasev, and Dirk Meissner. 2015. Integrated roadmaps and corporate foresight as tools of innovation management: The case of Russian companies. Technol. Forecast. Soc. Change 90 (Jan. 2015), 433\u2013443.\n\n[98] Georg Von Krogh, Stefan Haefliger, Sebastian Spaeth, and Martin W Wallin. 2012. Carrots and rainbows: Motivation and social practice in open source software development. MIS Quarterly (2012), 649\u2013676.\n\n[99] Kangning Wei, Kevin Crowston, U Yeliz Eseryel, and Robert Heckman. 2017. Roles and politeness behavior in community-based free/libre open source software development. Information & Management 54, 5 (July 2017), 573\u2013582.\n\n[100] Joel West and Scott Gallagher. 2006. Challenges of open innovation: the paradox of firm investment in open-source software. R&D Management 36, 3 (2006), 319\u2013331.\n\n[101] Joel West and Siobh\u00e1n O\u2019Mahony. 2008. The Role of Participation Architecture in Growing Sponsored Open Source Communities. Industry and Innovation 15, 2 (April 2008), 145\u2013168.\n\n[102] Chorng-Guang Wu, James H Gerlach, and Clifford E Young. 2007. An empirical analysis of open source software developers\u2019 motivations and continuance intentions. Information & Management 44, 3 (2007), 253\u2013262.\n\n[103] Xuan Xiao, Aron Lindberg, Sean Hansen, and Kalle Lyytinen. 2018. \u201cComputing\u201d Requirements for Open Source Software: A Distributed Cognitive Approach. Journal of the Association for Information Systems 19, 12 (2018), 1217\u20131252.\n\n[104] J Xie, M Zhou, and A Mockus. 2013. Impact of Triage: A Study of Mozilla and Gnome. In International Symposium on Empirical Software Engineering and Measurement. IEEE, 247\u2013250.\n\n[105] Yunwen Ye and Kouichi Kishida. 2003. Toward an Understanding of the Motivation Open Source Software Developers. In Proc. International Conference on Software Engineering (Portland, Oregon) (ICSE \u201903). IEEE Computer Society, Washington, DC, USA, 419\u2013429.\n\n[106] Robert K Yin. 2017. Case study research and applications: Design and methods. Sage publications.\nA EMAIL INTERVIEW QUESTIONS\n\n\u2022 Q1. How much do Rust roadmaps influence your decision about what work you contribute to the Rust project?\n No influence at all 1 2 3 4 5 A lot of influence\n Explain (optional)\n\n\u2022 Q2. In your opinion, how helpful are roadmaps for the Rust community?\n Not at all helpful 1 2 3 4 5 Very helpful\n Can you explain in what way they are helpful or unhelpful? (optional)\n\n\u2022 Q3. How much do Rust roadmaps (e.g. for working groups or projects) match your own priorities for Rust?\n Do not at all represent my priorities 1 2 3 4 5 Represent my priorities very well\n Explain (optional)\n\n\u2022 Q4. How could the use of roadmaps in Rust be improved in the future?\n\n\u2022 Q5. How many years have you been involved with Rust?\n\n\u2022 Q6. Have you been on any official Rust team or working group?\n Yes No\n\nB ROADMAP TOPIC HEURISTICS\n\nWe began by manually extracting a list of topics from the 2018 roadmap. To assign topics to particular issues, PRs, and RFCs, we used the following method:\n\n\u2022 Two researchers independently compiled a list of topics from this document, identifying bullet points or lists in the text that appeared to identify specific features. One researcher\u2019s list was strictly longer (36 items) than the other\u2019s (23 items), so the two discussed each of the additional topics and included all but two of them, resulting in 34 topics.\n\n\u2022 Using the generated list, one researcher generated a list of proposed search keywords for each topic, using acronyms, distinctive terms, or word sequences found in that part of the roadmap, that the researcher judged would have high selectivity for distinguishing text about that topic from general Rust discussion. The final list is shown in Table B\n\n\u2022 Labels (short strings used by GitHub to tag issues, RFCs, and pull requests) were assigned to roadmap topics by applying the keywords to the labels\u2019 descriptions as shown here: https://github.com/rust-lang/rust/labels; for example the label A-net was assigned to topic \u201cnetwork services\u201d because it matched the search term \u201cnetworking\u201d. Both researchers checked through this list of labels and their descriptions, and agreed that they matched the topics.\n\n\u2022 This mapping was used to assign topics to all issues, PRs, and RFCs in rust (excluding so-called \"Rollup\" PRs). An issue, PR, or RFC was assigned to a topic if it was tagged with a label that mapped to that keyword.\n\n\u2022 Topics were also assigned to RFCs, and tracking issues (a subset of issues formally tied to certain RFCs) if the search terms matched the item\u2019s title.\n\n\u2022 We then spread activation from RFCs to issues, issues to PRs and RFCs, and PRs to issues: that is, an issue inherits the topic of an RFC if the RFC lists the issue as an official tracking issue. A PR inherits the topic of an issue if the PR mentions the issue ID in its initial description. This was not done recursively.\n\n\u2022 We assign a commit to a topic if it was part of a non-Rollup PR of that topic that was eventually merged into the main thread. We omitted commits with multiple parents (to avoid double counting merges of commits) and commits of more than 100 files (to avoid commits that were mass moves of files).\n\u2022 \"Discussion effort\" was operationalized as characters of text in the header and commentthread of each RFC discussion, issue, or PR, excluding code embedded in those comments (which is delimited by triple backticks).\n\u2022 \"Coding effort\" was operationalized as lines of code deleted plus lines of code added.\n\u2022 \"Team contributors\" were operationalized as anyone who was a member of one of the teams listed on Rust\u2019s governance page at the beginning of 2018.\n\nAlso note that some development happened outside these repositories; for example there is a rust-lang/cargo repository; we only capture aspects of development that affect the main compiler project.\n\nTable 6. Search terms for identifying 2018 roadmap topics in labels and text. The left and middle columns are used as search terms within the descriptions of labels; the right column shows the labels that matched.\n\n| 2018 Topic | Search Terms | Labels |\n|-----------------------------|------------------------------------------------------------------------------|---------------------------------------------|\n| add edition flag to rustfix | (edition AND rustfix) OR (2018 AND lint* AND rustfix) | A-async-await, AsyncAwait-Triaged, AsyncAwait-Focus, AsyncAwait-OnDeck, F-async_await |\n| async/await | (async AND await) OR (async/await) | |\n| build system integration | | |\n| cargo custom registries | (Cargo AND registry) OR (Cargo AND registries) | A-registry |\n| Cargo/Xargo integration | cargo AND xargo | |\n| CLI apps | (CLI app*) OR (CLI application*) OR (command AND line AND app*) OR (command AND line AND application*) | |\n| Clippy | (Clippy AND rustup) OR (Clippy AND 1.0) OR (Clippy AND 1 AND 0) | A-lint |\n| compiler optimizations | (optimization*) OR (optimisation*) OR (optimize) OR (optimise) | A-optimization, A-LLVM, A-mir-opt |\n| compiler parallelization | (parallelization) OR (parallelisation) | |\n| compiler-driven code | (auto-complete AND RLS) OR (completion AND RLS) | |\n| completion for RLS | | |\n| const generics | | A-const-generics, F-const_generics |\n| custom allocator | custom AND allocator* | A-allocators |\n| custom test frameworks | custom AND test AND framework* | F-custom_test_frameworks |\n| embedded device | embedded | WG-embedded |\n| GATs | (generic AND associated AND type*) OR (associated AND type AND constructor*)| F-generic_associated_types |\n| generator | | A-generators, F-generators |\n| 2018 Topic | Search Terms | Labels |\n|-----------------------------|------------------------------------------------------------------------------|---------------------------------------------|\n| improve compiler error | error* AND message* | A-diagnostics, F-on_unimplemented |\n| message | | |\n| incremental compilation | incremental AND compilation | A-incremental, A-incr-comp, WG-compiler-incr|\n| internationalization | (internationalization) OR (internationalisation) | |\n| macros 2.0 hygiene | (macro* AND hygiene) OR (macro* AND 2.0) OR (macro* AND 2 AND 0) OR (hygiene)| A-hygiene, A-macros-2.0 |\n| MIR-only rlibs | MIR AND rlib* | |\n| modules revamp | modules | A-modules |\n| network services | networking | A-net |\n| non-lexical lifetimes | (NLL) OR (non AND lexical AND lifetime*) OR (non-lexical AND lifetime*) | A-NLL, NLL-complete, NLL-diagnostics, NLL-fixed-by-NLL, NLL-performant, NLL-polonius, NLL-reference, NLL-sound |\n| public dependencies in | (cargo AND libstd) OR (cargo AND std) OR (cargo AND xargo) | |\n| cargo | | |\n| revise cargo profiles | cargo AND profile* | A-profile |\n| RLS 1.0 | RLS | A-language-server, A-rls |\n| rustdoc RLS-based edition | RLS AND rustdoc | |\n| rustfmt | rustfmt | |\n| Ship or drop ergonomics RFCs| (ergonomics AND rfc) OR (ergonomics AND initiative) | Ergonomics Initiative |\n| SIMD | | A-simd, F-simd_ffi |\n| stabilize impl Trait | impl Trait | A-impl-trait, F-impl_trait_in_bindings, F-type_alias_impl_trait |\n| tokio | | |\n| web assembly | (webassembly) OR (wasm) OR (web assembly) | O-wasm |\n\nReceived June 2020; revised October 2020; accepted December 2020", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/013-klug.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 28, "total-input-tokens": 71900, "total-output-tokens": 26650, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 3228, 1], [3228, 7367, 2], [7367, 11174, 3], [11174, 15456, 4], [15456, 19410, 5], [19410, 23351, 6], [23351, 26866, 7], [26866, 30061, 8], [30061, 33882, 9], [33882, 37321, 10], [37321, 41469, 11], [41469, 45075, 12], [45075, 48058, 13], [48058, 52318, 14], [52318, 56694, 15], [56694, 60810, 16], [60810, 65219, 17], [65219, 69488, 18], [69488, 73673, 19], [73673, 77245, 20], [77245, 81022, 21], [81022, 86170, 22], [86170, 91125, 23], [91125, 96134, 24], [96134, 100188, 25], [100188, 103359, 26], [103359, 107275, 27], [107275, 110892, 28]]}}
|
|
{"id": "a3f7dbdf1d323ae00adeb70cbe6169d7342b3991", "text": "Changes in free and open source software licenses: managerial interventions and variations on project attractiveness\n\nCarlos Denner dos Santos Jr\n\nAbstract\n\nThe license adopted by an open source software is associated with its success in terms of attractiveness and maintenance of an active ecosystem of users, bug reporters, developers, and sponsors because what can and cannot be done with the software and its derivatives in terms of improvement and market distribution depends on legal terms there specified. By knowing this licensing effect through scientific publications and their experience, project managers became able to act strategically, loosening up the restrictions associated with their source code due to sponsor interests, for example; or the contrary, tightening restrictions up to guarantee source code openness, adhering to the \u201cforever free\u201d strategy. But, have project managers behaved strategically like that, changing their projects license? Up to this paper, we did not know if and what types of changes in these legal allowances project managers have made and, more importantly, whether such managerial interventions are associated with variations in intervened project attractiveness (i.e., related to their numbers of web hits, downloads and members). This paper accomplishes these two goals and demonstrates that: 1) managers of free and open source software projects do change the distribution rights of their source code through a change in the (group of) license(s) adopted; and 2) variations in attractiveness are associated with the strategic choice of a licensing schema. To reach these conclusions, a unique dataset of open source projects that have changed license was assembled in a comparative form, analyzing intervened projects over its monthly periods of different licenses. Based on a sample of more than 3500 active projects over 44 months obtained from the FLOSSmole repository of Sourceforge.net data, 756 projects that had changed their source code distribution allowances and restrictions were identified and analyzed. A dataset on these projects\u2019 type of changes was assembled to enable a descriptive and exploratory analysis of the types of license interventions observed over a period of almost four years anchored on projects\u2019 attractiveness. More than 35 types of interventions were detected. The results indicate that variations in attractiveness after a license intervention are not symmetric; that is, if a change from license schema A to B is beneficial to attractiveness, a change from B to A is not necessarily prejudicial. This and other interesting findings are discussed in detail. In general, the results here reported support the current literature knowledge that the restrictions imposed by the license on the source code distribution are associated with market success vis-a-vis project attractiveness, but they also suggest that the state-of-the-science is superficial in terms of what is known about why these differences in attractiveness can be observed. The complexity of the results indicates to free software managers that no licensing schema should be seen as the right one, and its choice should be carefully made, considering project strategic goals as perceived relevant to stakeholders of the application and its production. These conclusions create awareness of several limitations of our current knowledge, which are discussed along with guidelines to understand them deeper in future research endeavors.\n\nKeywords: Open source software, Attractiveness, Software license, Intellectual property, GPL, Free software, Governance, Project and people management, Information technology, Software project, Open source\n\nCorrespondence: carlosdenner@unb.br\nDepartment of Management (PPGA/ADM), University of Brasilia (UnB), Bras\u00edlia, Brazil\n\n\u00a9 The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.\n1 Introduction: collective production and legal issues\n\nSociety and its creations have become increasingly complex as our body of knowledge grew, and information retrieval technologies evolved. Innovating and competing on a global scale is no activity for an individual alone. Searching for partners and peers to collaborate with and in projects is a crucial task in most fields, notably in science, software engineering and public policy management [1\u20133]. Experts have noticed this and expressed such notion by saying that modern inventors are organizations, not individuals and that production processes are best dealt with in an open and public fashion, as opposed to the proprietary and private economic model for firm production [3\u20135]. This change, of course, raises concerns on how the rights of such collective goods (properties) should be regulated and managed as to prevent disincentives for entrepreneurship, cooperation and thus maintain the labor market active and sustainable [6\u20138].\n\nThe digitalization of the world has stimulated this trend of working in collectivities by decreasing the costs of searching for collaborators and using communication technologies to coordinate production activities. The asynchronicity of production activities over the web has led many investigators and developers to engage in geographically distributed projects, such as for software development [9, 10]. For at least the last 20 years, this phenomenon of \u201ccollective production\u201d has been particularly prominent in the development of free and open source software (free software, for short), reshaping the information technology (IT) industry as it became a strategic player. Nowadays, there are hundreds of thousands of free software projects online, each representing a computer supported cooperative work opportunity for generating an active and growing ecosystem of users and contributors capable of joint development at an unprecedented scale [11, 12].\n\nFree software projects (FSP) reflect the intention of a founder, the original owner of the property rights, to share costs of continuous software improvement, user base expansion, and visibility growth [13\u201315]. The ability to attract peers to co-create with the founder is understood as the attractiveness of the project [12]. Richard Stallman and Linus Torvalds are among the first and most famous ones to publicize this type of intention, bringing forth the GNU operating system and Linux, a project incredibly successful that alone impacted the IT industry deeply. Unsurprisingly, inspired by the Linux case, many organizations have created FSP as a deliberate organizational strategy, known as open sourcing, an alternative to the classic outsourcing possibility [11]. When successful, FSP involve active communities structured as networks for the evolution of public software through a resourceful communication channel between users, developers and sponsors. Nevertheless, in these terms, success has been achieved only by a small fraction of the total number of FSP, making the investment of releasing intellectual property to the public and assembling a proper IT infrastructure risky and worth of managerial consideration, as a failed attempt wastes organization\u2019s limited resources [12\u201316].\n\nIn this scenario of uncertainty and competition on whether the attention of users and developers will be obtained, knowledge on how to effectively create and manage FSP to suit better the demands and interests of stakeholders, be a sponsor or a co-developer, is useful and timely. Founders and managers should take into account the stakeholders demands and interests as they expect that to translate on increasing software adoption and intention to contribute (i.e., people reporting and developers fixing bugs). One of the central issues in the literature of open source project affecting intention to adopt and contribute, its attractiveness, is the license terms, the legal specifications under which the software has been released to regulate further improvement and distribution [6, 7, 16\u201318].\n\nThe influence of the license choice has been discussed on many grounds, from a legal [6], strategic [3, 8] and sociological [7] standpoints. The main effects can be summarized as related to people\u2019s motivation in getting involved as some in the community (stakeholders) believe that private property should not be a derivative of a public one; a legal restriction that has been found to scare corporations\u2019 investments away from software obliged to be always free and open (e.g., licensed GPL 2.0). This duality of effects creates a tension where the interests of all cannot be met at once, forcing FSP managers to choose a strategic path and \u201cpick a side\u201d in terms of licensing, distribution rights.\n\nA major concern has been the terms under which the application source code is allowed to be modified and re-distributed. Free software can be modified, and the result of that modification distributed in a sold hardware, for example, and the source code of the embedded software kept proprietary, or not, depending on the license chosen. According to previous studies, the intellectual property policy delineated by the chosen license schema has the power to drive people and organizations away from adopting and contributing to FSP, and operates as a governance mechanism, thereby impacting the attractiveness of the project and consequently its production activities [6\u20138, 12, 17\u201319].\n\nIn a nutshell, the license is believed to influence FSP\u2019s attractiveness, production activities and, thereby, success. As this strategic effect becomes known to FSP founders and managers, assuming their rationality towards the attempt to be successful, an expectation that they should act in practice and change their project licenses to affect\nattractiveness is created. This paper represents a methodological advance in comparison to previous studies, as it verifies this theoretically-derived expectation of a relationship between license and attractiveness by performing a longitudinal study with a large sample observed in natura over a wide time frame. This methodological approach was specifically developed towards the answers of the following research questions: 1) Do intellectual property interventions, license changes, occur in practice? 2) Are the different licensing schemas chosen by project managers associated with FSP attractiveness? These questions are answered with a sampling strategy designed to identify the projects that have changed licenses, followed by a statistical analysis of various types of license interventions that FSP managers have decided to make, changing thereby the legal restrictions of their software (and thereby their project attractiveness). Nevertheless, besides this methodological improvement to the literature found here, this paper also contributes in the sense that most previous empirical studies have considered that an open source project has only one type of license, even though many of these projects have more than one. This paper incorporates that in its methodological procedures and improves the classic way of classifying licenses based on Lerner and Tirole\u2019s work in a more realistic, empirically-based schema. Furthermore, the unique dataset assembled to produce this paper is released open, free of charge along with its publication, which is another form of contribution to future research endeavors Additional file 1.\n\nThe scientific basis grounding the theoretical expectations just spelled out are next stated in more details, a foundation followed by a methods section describing the specific steps followed to obtain the sample and results discussed before the conclusions.\n\n1.1 Theoretical foundations: definitions and related work\n\n1.1.1 Free and open source software projects\n\nIn general, projects are endeavors toward goals, such as writing a paper or developing software. When a software project has its source code freely and publicly available online for use and modification with a license specifying that attached to it, it may be classified as a free and open source software project [7, 8, 11, 12]. Free software projects (FSP) are the object of interest to this study for their position as key players in the IT industry. Several of them have become widely known, such as the GNU/Linux operating system, the R statistical package, and the Apache web server. The communities maintaining these systems are large, active and professional, producing first class applications in their domains and receiving sponsorship from companies such as IBM and Google. However, beyond these high-class applications, most FSP has not become successful, never attracting external users and contributors to generate a network of peers producing useful, up-to-date public software freely available [12\u201314].\n\n1.2 The role of attractiveness\n\nOne way to understand why some FSP are successful and others are not is through the study of their attractiveness [12], or their \u201cmagnetism and stickiness\u201d as some have more informally stated. Attractiveness is a common cause of how many visitors a project website receives, how many users it has, or its number of downloads, and how many contributors it possesses. FSP attractiveness is a concept considered responsible for the (lack of) flow of market resources, basically time and money, to the project. Higher attractiveness leads to more intention to adopt (download) and contribute (become a member), motivating and justifying production activities and investments towards the software to improve quality and generate innovation via the \u201cmore eyeballs effect\u201d [12, 19, 20]. FSP attractiveness has a vital role in this perspective, and it is evident how important it is to understand what influences or is associated with attractiveness variations.\n\n1.3 The choice of license and FSP success\n\nThe choice of license impacts FSP success because it defines the scope of doing business with the distribution of the software and its derivatives, perhaps preventing the source code hijacking, or impacting the reuse or \u201ccitation\u201d incentive, but for sure influencing stakeholders\u2019 perception of control and utility over the technology. People and organizations take the license terms into consideration on deciding whether to adopt and use free software and, later, if it is worthy contributing to or reusing the source code [7, 8, 16, 21]. Figure 1 depicts this thesis causal chain, from intellectual property choice to attractiveness and then software quality/project success.\n\nIn summary, based on the literature review in which this study is grounded [8, 12], Fig. 1 can be read from left to right as FSP managers select a license that defines the restrictions applied to the source code redistribution, which affects the flow of market resources to the project (visits to the website: visitors, downloads: intention to use, and membership: intention to contribute). As a consequence of an increase in the project attractiveness, with more people thus interested in the software quality, more bugs will be reported and fixed, and new features will be requested and developed, influencing directly in the project long-term success. Accordingly, this causal chain is expected to be \u201cdisturbed\u201d by a managerial intervention/change in the project license, as the interests of relevant stakeholders (sponsors, volunteers, etc.) might not be met anymore.\n\nTo explore empirically this hypothesis, based on what has been done in previous research [8, 12, 21, 22], this study focuses on four types of legal restrictions that may be applied to the free and open source code. The first relates to whether the source code is \u201crestrictive\u201d, requiring derivative works to be released under the same\nlicense in case of redistribution [19]; the second, to whether it is \u201chighly restrictive\u201d, which besides being restrictive, forbids the source code to be even mingled for compilation with software of a different license [19]; the third, to whether the code may be relicensed, meaning that \u201cany distributor has the right to grant a license to the software [\u2026] directly to third parties\u201d ([7], p. 88); and the fourth, to whether a project is licensed under the Academic Free License, since it was written to correct problems of important licenses such as MIT and BSD [7] and is understudied. Methodologically speaking, projects licenses were classified in this basis, including the cases where a project would have more than a license. Therefore, in this schema, a project might not have a restriction for one group of stakeholders, students for example, but do have that restriction for corporations. This methodological choice reflects the reality of open source projects more accurately but has the downside of being more complex, as the results will demonstrate for themselves.\n\nThe basic sampling strategy idea that guided this research was to look for projects that have undergone a change in these legal terms during their life-cycle and verify possible associations/variations on the main indicators of attractiveness of such projects. This approach aims to uncover whether FSP managers change legal restrictions over their projects life-cycle (research question #1-RQ1) and evaluate whether the success of FSP is associated with the legal terms change through a before-and-after statistical analysis of a managerial intellectual property intervention (IPI) on project attractiveness (research question #2-RQ2). These intents together have not been addressed in previous research with such methodological approach.\n\n2 Methods: data, sampling and statistical analyses\n\nTo obtain data capable of answering the questions of whether FSP managers have performed changes in their schema of licensing over the years (RQ1), and whether these changes are associated with the attractiveness of the project (RQ2), a search on the internet for secondary data on free software projects was made. A few options popped up, such as the University of Notre Dame based, but the more seemingly straightforward one was chosen, FLOSSmole [23]. Data obtained and released by FLOSSmole on all projects from the largest free software repository available online [6] at the time of this project data collection efforts was organized in a database for inspection, covering 44 months of activities. This database was filtered down to contain only those projects that have changed their listed licenses over the years covered in the obtained dataset. If this filtered dataset was equal to zero projects, the first research question of this paper would be \u201cno, FSP managers have not changed their license schema, despite the known effect of that on attractiveness found in previous research\u201d. But the empirical answer is yes, FSP managers have made these interventions (aka, IPI) hundreds of times in this research sample.\n\nAfter obtaining this working sample, a data organization process was performed, classifying the various licenses of projects (many have more than one license at a given point) into the categories described right after Fig. 1 shown above. All information on the project audience (end-user or developer, for example), date of creation, etc. was also kept for sample description, and data on numbers of web hits, downloads, and members were gathered monthly to allow for comparisons on these indicators of attractiveness anchored on the type of licensing schema intervention. The choice of these specific indicators is aligned with previous research [12], where attractiveness was first directly addressed in the specialized literature. A few more details on this data preparation procedure are described below.\n\nThe sampling and filtering procedures adopted were specifically designed to detect the changes in license terms adopted by FSP managers and explore if these IPI are associated with FSP attractiveness variations. As the ideal methodological situation of random selection of projects to undergo a license change is not possible due to the impossibility of doing that with other people\u2019s project (this is not an experiment), alternatively, to\ncontrol for confounding effects, projects that had their listing categories or audiences changed during the period covered by this study were selected out too. Also, any project with missing data on the number of members was also removed from the sample, as this indicates an \u201corphan\u201d project. The working sample is of 756 FSP with monthly data covering a period of 44 months, from October/2005 to June/2009 (1 month was missing in FLOSSmole, July 2008).\n\nFor each project, monthly data on its license were collected for further classification based on the legal restrictions covered in this paper, as explained before. This classification set forth here is based on previous research, which has always treated licenses by their restrictions of 1) compatibility for mingling with a different software during compilation (when not, referred to as \u201chighly restrictive\u201d), 2) whether an improvement of a software must be released as free software as well (when yes, referred to as \u201crelicensable\u201d), and 3) whether a software might be relicensed by third party to a different license originally chosen (referred to as \u201crelicensable\u201d). However, the empirical fact that projects have more than one license challenges a classification that considers a project simply based on one of its licenses. Free software projects choose schemas of licensing, for example, with a \u201chighly restrictive\u201d stamp for non-payers, and a \u201crelicensable\u201d option for who pays for the software. The classification adopted here takes that into account to obtain a more accurate however complex picture of projects licensing schema. All listed projects\u2019 licenses were considered, and so a dual-licensed project might indeed be \u201cRestrictive, Highly Restrictive and Relicensable\u201d, something that at first sight can appear contradictory. This classification was performed per month, and changes in the schema, managerial interventions detected were flagged for further analysis.\n\n### 3 Results and Findings: descriptive statistics towards RQ1\n\nTable 1 summarizes all interventions detected along with labels given to them (see column \u201cdescription\u201d), and the number of occurrences of each type of change in legal terms is displayed in the table cells. This table represents the detailed answer to RQ1. One can see, for example, that GPL was involved in the managerial interventions 715 times (being the end-state 298 times, the sum of column F, and a beginning state of the change 417 times, the sum of row F). In the description column, one can see that the GPL is restrictive and highly restrictive, that is, derivative work redistributed must be GPL as well, and source code mingled with it during compilation must be GPL as well (a \u201cviral\u201d license). Further, GPL software cannot be relicensed under a different license. GPL is thus restrictive, highly restrictive and non-relicensable. GPL motivates the most managerial interventions, probably due to its popularity and mixed feelings of the community with its adoption (loved by those who believe in \u201cfree software forever\u201d and not so much by those primarily guided by competitive motivations). This GPL leadership is followed by the dual-licensing strategy, where FSP managers decide to release code under different licenses depending on the interest and profile of the user (e.g., whether an individual or a for-profit organization). These interventions ranking and the number of their occurrences can be found on Table 1\u2019s column for data related to the new license type chosen to be adopted, and on its rows for the data about the license type abandoned by the project (the \u201cfrom\u201d and \u201cto\u201d indicated in the first cell of the second row).\n\nAdditionally, monthly data on Web hits (visitors), downloads (intention to install the software use) and a number of members (intention to contribute reporting bugs or features), besides the type of project and development stage, were gathered. Table 2 contains the\n\n| From\\To Description | Count of license type interventions in sample |\n|---------------------|---------------------------------------------|\n| | A | B | C | D | E | F | G | Sum | Ranking |\n| A None (or \u201cother\u201d). | 0 | 22 | 2 | 13 | 3 | 47 | 1 | 88 | 5 |\n| B Non-Restrictive and Relicensable (e.g., Public Domain or MIT). | 8 | 0 | 7 | 20 | 16 | 31 | 45 | 127 | 4 |\n| C Academic Free License-AFL (Non-Restrictive and Relicensable). | 2 | 5 | 0 | 0 | 7 | 0 | 14 | 7 | |\n| D Restrictive and Non-Relicensable (e.g., GNU Lesser General Public License-LGPL). | 6 | 34 | 0 | 0 | 21 | 67 | 6 | 134 | 3 |\n| E Restrictive and Relicensable (e.g., Mozilla Public License-MPL). | 3 | 19 | 0 | 12 | 0 | 7 | 8 | 49 | 6 |\n| F Restrictive, Highly Restrictive and Non-Relicensable (e.g., GNU General Public License-GPL). | 36 | 81 | 3 | 137 | 5 | 0 | 155 | 417 | 1 |\n| G Restrictive, Highly Restrictive and Relicensable (e.g., dual licensed: GPL and Apache). | 0 | 32 | 0 | 6 | 6 | 139 | 0 | 183 | 2 |\n| Sum | 55 | 193 | 12 | 188 | 51 | 298 | 215 | 1012| |\n| Rank | 53 | 74 | 61 | 2 | | | | | |\n\nSource: author\u2019s own\ndescriptive statistics for the numerical variables, and Table 3 the frequency of projects that have a particular type of license versus their development status in the first month of the dataset, October of 2005. To calculate \u201cattractiveness,\u201d a latent construct, the correlation matrix of a previous study [12] was used in a principal component analysis [24], where a linear combination of three indicators of attractiveness was identified to maximize the explained variance. The first principal component extracted is operationally defined as \\((0.63\\times \\log\\text{webhits} + 0.64\\times \\log\\text{downloads} + 0.43\\times \\log\\text{members})\\) and explains 65% of the sample variance. This first component extracted was used to calculate a new variable named attractiveness, a result of the multiplied sum of projects log-transformed web hits, downloads and number of members at any given month. This measure of attractiveness expresses the ability of a project to attract these market resources from the environment where it competes with other projects. Attractiveness is thus a common cause of website visits, downloads and membership numbers. Data was organized and statistically analyzed with R.\n\nFrom Table 2, one can see that in the sample: 1) projects were founded as early as 1999; 2) on average a project had approximately 378 downloads in October 2005; and that at least one project has four different licenses listed at this point. Table 3 depicts a different picture, showing that: 1) 48% of projects, 363, are licensed GPL (restrictive + highly_restrictive + non-relicensable) and out of these 95 are in beta stage; 2) 11% of the 756 projects have no license specified; and 3) only 7 projects have no license and no development status on their file at October/2005. This distribution of projects in the sample demonstrates a wide variability over the various stages of software lifecycle, reducing once more the limitations of non-experimental nature of this study and its potential sampling biases.\n\n### 4 Results and Findings: preparing to answer RQ2\n\nTo explore the IPI associations with attractiveness variations and obtain some if any statistical evidence of variation, FSP were classified according to the type of intervention they were subject to every month, and the working sample was again organized and analyzed in the following fashion.\n\nTo allow for statistical comparisons with reasonable sample sizes, the dataset was reorganized to display the seven licensing schemas, from A to G on columns, and attractiveness on the rows. In this new dataset, each cell represents the attractiveness of a project in a specific month, broken by licensing schema with the various columns. This analytical strategy of treating the licensing\n\n### Table 2\n\n| Variable | Minimum | Maximum | Mean | Std. deviation |\n|---------------------------|---------|---------|-------|----------------|\n| Registered | 11/04/1999 | 3/13/2009 | 1/08/2003 | \u2013 |\n| n_licenses.200510 | 0 | 4 | 1.10 | 0.39 |\n| attractiveness.200510 | 0 | 16.12 | 5.4694| 3.33 |\n| downloads.200510 | 0 | 34,514 | 378.07| 1941.56 |\n| webhits.200510 | 0 | 836,740 | 9267.23| 48,727.7 |\n| members.200510 | 1 | 55 | 3.38 | 5.22 |\n\nSource: Author\u2019s own\n\n### Table 3\n\n| Type of license/development status | Alpha | Beta | Mature | None | Planning | Prealpha | Stable | Total |\n|-----------------------------------|-------|------|--------|------|----------|----------|--------|-------|\n| A | # | 12 | 18 | 2 | 7 | 7 | 33 | 86 |\n| % | 1.6% | 2.4% | 0.3% | 0.9% | 0.9% | 0.9% | 4.4% | 11.4% |\n| B | # | 25 | 36 | 4 | 2 | 15 | 11 | 33 | 126 |\n| % | 3.3% | 4.8% | 0.5% | 0.3% | 2.0% | 1.5% | 4.4% | 16.7% |\n| C | # | 1 | 1 | 0 | 2 | 1 | 0 | 3 | 8 |\n| % | 0.1% | 0.1% | 0% | 0.3% | 0.1% | 0% | 0.4% | 1.1% |\n| D | # | 22 | 33 | 2 | 5 | 15 | 13 | 38 | 128 |\n| % | 2.9% | 4.4% | 0.3% | 0.7% | 2.0% | 1.7% | 5.0% | 16.9% |\n| E | # | 4 | 1 | 0 | 0 | 1 | 1 | 3 | 6 | 25 |\n| % | 0.5% | 1.3% | 0% | 0.1% | 0.1% | 0.4% | 0.8% | 3.3% |\n| F | # | 84 | 95 | 9 | 8 | 36 | 38 | 93 | 363 |\n| % | 11.1% | 12.6%| 1.2% | 1.1% | 4.8% | 5.0% | 12.3% | 48.0% |\n| G | # | 4 | 7 | 0 | 0 | 2 | 3 | 4 | 2 |\n| % | 0.5% | 0.9% | 0% | 0% | 0.3% | 0.4% | 0.5% | 2.6% |\n| TOTAL | # | 152 | 200 | 17 | 25 | 77 | 75 | 210 | 756 |\n| % | 20.1% | 26.5%| 2.2% | 3.3% | 10.2% | 9.9% | 27.8% | 100% |\n\nSource: Author\u2019s own\nschema and not the specific change of the schema increased the sample size immensely and permitted statistical mean comparisons of attractiveness, as RQ2 required. The classic t-test, robust to violations of assumptions with such large samples, was performed using the software SPSS.\n\nThe descriptive statistics, variable by variable, for this new dataset is shown below in Table 4, and it is possible to see that the smallest sample size is 265, which means that in 33,264 month-projects available (756 projects times 44 months), 265 month-projects could be flagged with a C type of schema.\n\n5 Results and Findings: revisiting RQ1 towards RQ2\n\nA project license or schema of licensing imposes restrictions and allowances to the application adopter and source code contributor, creator of a derivative work. For example, a company that customizes a GPL application and distributes it in the market is obliged to make the source code of the redistributed, improved public software. The license choice is a strategic decision with social and economic impacts on the project, as it can block the interests of people related to the software, that is, users, developers and other relevant stakeholders. A major decision like that is not expected to occur very often, as managers avoid status quo changes that harm expectations and turn people\u2019s attention away from the actual work (e.g., into politics and disputes). This tendency to not change strategic matters is known in the organizational literature as structural inertia [25].\n\nIn conformance to this organizational inertia, out of thousands of free software projects obtained from FLOSSmole and Sourceforge.net and analyzed in this research, only 756 have decided to change their license type over the period of 44 months covered in this research, from October/2005 to June/2009, missing July/2008. Nevertheless, as it has already been shown in Table 1, these 756 projects that changed licenses have done so 1012 times, a considerable number that validates the theoretical expectation of managerial action through changes in software legal restrictions towards meeting stakeholders\u2019 demands and expectations for project success. Previous research has stated that the license affects the probability of project success and, accordingly, FSP managers have indeed attempted changes in legal restrictions.\n\nIn terms of specific results, leaving projects exposed and legally unattended, the managerial decision of not having a license specified was detected both ways, as projects left the \u201cnone\u201d choice 88 times and, surprisingly, changed their current state of having a license to one where they have no license 55 times (see Table 1, type of license A). In fact, it has been found that projects have had no license specified in every month covered by this research. FSP with no license, the \u201cnone\u201d A-category created, have less average attractiveness than restrictive/relicensable and dual-licensed projects often, but have more attractiveness than GPL (F-schema). Let us now move one step further to analyze the data numerically.\n\nTo initially explore the statistical associations of attractiveness and license, the ratios of mean attractiveness after/before interventions were computed, considering all projects of a given change in licensing schema (summarized in Table 5). For calculating the ratios, it was summed up for all projects of a specific license, after the attractiveness component was calculated for standardization. It is the sum of the attractiveness of all projects in a state of license change for each type of change. Projects were aggregated and afterwards one ratio was calculated by dividing their mean attractiveness after the change by their mean attractiveness before the change.\n\nTo interpret the results in Table 5, for example, one can see that the ratio of 0.94 in the first row indicates that projects changing from type of license A to B experienced lower levels of attractiveness after the intervention, that is, moving away from a status of having no license (A) and going to a status of \u201cpublic domain\u201d license (B) is on average detrimental to attractiveness (specifically, a reduction\n\n| From*\n| To | A | B | C | D | E | F | G |\n|---|---|---|---|---|---|---|---|\n| Ab | \u2013 | 0.94 | 1.07 | 1.06 | 1.14 | 1.09 | 0.87 |\n| Bcdf | 0.96 | \u2013 | 0.97 | 1.02 | 1.03 | 0.98 | 1.01 |\n| Cdf | 0.92 | 0.93 | \u2013 | \u2013 | \u2013 | 1.05 | \u2013 |\n| Dbeg | 0.98 | 1.05 | \u2013 | \u2013 | \u2013 | 0.96 | 1.03 |\n| Edg | 0.70 | 0.86 | \u2013 | 0.91 | \u2013 | 0.89 | 0.89 |\n| Fbc | 0.89 | 1.00 | 2.00 | 0.98 | 1.06 | \u2013 | 1.01 |\n| Gde | \u2013 | 0.85 | \u2013 | 0.98 | 0.88 | 0.89 | \u2013 |\n\n*Superscript letters indicate an asymmetric effect of interventions, that is, going from one license to another has a similar effect (e.g., it is both positive to leave C and go F, and to leave F and go to C)\n\nSource: Author\u2019s own\n\nTable 4 Descriptive statistics for mean comparisons by licensing schema\n\n| License Type | Sample Size | Minimum | Maximum | Mean | Std. Dev |\n|---|---|---|---|---|---|\n| A_attractiveness | 2134 | 0.44 | 14.60 | 6.9037 | 2.81252 |\n| B_attractiveness | 5322 | 0.00 | 16.12 | 6.4007 | 2.75651 |\n| C_attractiveness | 265 | 0.00 | 11.12 | 6.4196 | 2.09862 |\n| D_attractiveness | 5522 | 0.30 | 14.46 | 6.7547 | 2.31416 |\n| E_attractiveness | 1073 | 0.30 | 15.97 | 7.4004 | 2.74449 |\n| F_attractiveness | 9849 | 0.00 | 16.83 | 6.6443 | 2.88175 |\n| G_attractiveness | 1865 | 0.44 | 18.01 | 7.6265 | 3.08157 |\n\nSource: Author\u2019s own\nof 6%). However, that strategic move has been detected only 22 times in the sample (see Table 1), imposing a limitation to any robust statistical analysis of such variation in attractiveness. This limitation is overcome later in the analysis, with the t-tests as described in the methods section.\n\nMoving ahead with this exploratory results interpretation, as for the associations of the odd managerial action of moving away from having a license specified to not having one (type of change with \u201cA\u201d as target) with attractiveness variations, the average attractiveness ratio of projects that have undergone this type of change have been found to be always detrimental to attractiveness (column A of Table 5), demonstrating that stakeholders do not like the uncertainty associated with a project with no license. By looking at the interventions with A (the none choice) as target in Table 5, it is noteworthy that every time such change was made, the average project attractiveness decreased (a number smaller than one indicates the attractiveness ratio of after/before the change is on average pushed down). Additionally, when a project went from none to a restrictive and relicensable choice (A \u2192 E), this change was associated with an average change of 14% in attractiveness.\n\nFrom a distinct perspective, interestingly, the intervention from none to non-restrictive and relicensable (e.g., MIT), and to restrictive, highly restrictive and relicensable (i.e., dual licensed) led to an attractiveness reduction (see from A \u2192 B and A \u2192 G in Table 5). At this moment, one can only wonder the actual reasons for such findings in a case-specific manner, but the general theoretical interpretation is that relevant stakeholders\u2019 interests were harmed due to the project license change, affecting its consequent attractiveness.\n\nTogether, these findings related to the managerial decision of having no license specified can probably be interpreted in several ways, such as a sign of a not welcoming market to unregulated software, easier to suffer litigation, if you consider that a managerial change to not having a license specified is always detrimental. However, from another perspective, projects with no license can still be considered attractive, suggesting the possibility that the regular user does not take the license into account at all. Perhaps both explanations are valid and complementary, as the attractiveness measure adopted in this research groups the effects on developers and users together (downloads and membership numbers), and only future research can sort this out. Attractiveness is a cause that these variables have in common, but most likely it is not the only one (the first principal component extracted, for example, explains 64% of the variance, and so 36% is not due to this attractiveness measure). Future studies can dig into this line of inquiry, studying these indicators separately as well.\n\nBack to the results interpretation, by focusing on the most popular choice, the GPL, or more generically, the most restrictive licensing (i.e., restrictive, highly restrictive and non-relicensable \u2013 the F-schema), it has been found beneficial to projects to abandon this scheme for source code regulation concerning attractiveness increase. Overall, a positive variation with such change in terms of attractiveness was detected, but such strategic move was detrimental to FSP attractiveness when projects went to \u201cnone\u201d (A), or restrictive and non-relicensable (D), that is, normally, to the LGPL option (see changes involving F in Table 5). In support of these results, to become GPL was good to FSP attractiveness when the initial state was the absence of license (option A), the Academic Free License (C), or the LGPL one (D). These strategic interventions were detected 47, 7 and 67 times, respectively (Table 1). When taken together, these findings suggest that it is good to avoid the GPL, but it is better to adopt it when compared to having no license or the LGPL. The more challenging explanation for the findings of this type of change is the intervention from GPL to AFL (F \u2192 C) and the opposite (C \u2192 F), which are both positive. This means that it is good to change from GPL to Academic Free License, and it is also positive to change to GPL coming from the Academic Free License. This suggests that any change might be good to the project, depending on whether such change is aligned with FSP stakeholders\u2019 demands. The (lack of) symmetry on the effects of interventions can be better observed by looking at the matrix shown in Table 5 (the superscript letters), a pattern of the findings dealt with in details later in this section.\n\nAnalyzing all interventions together, out of 35 types observed in the sample, 13 were positive to attractiveness, 21 were negative, and only one neutral. In total, 1012 intellectual property interventions were found (an average of more than one per project). When taking the initial state (involves F in Table 5) into account, the most common managerial intervention is F (detected 417 times), and it has a consistent positive impact on attractiveness. The least common origin is C (14), and it is associated with a negative change in attractiveness. The largest negative impact occurs for the abandonment of E (15%), which was found 49 times. The mixed results apparent in a visual inspection of Table 5\u2019s coloring scheme suggests that interventions on types of licenses do not always come for good, and that there is always an impact, although only exploratory not statistical here, on attractiveness (the only exception is F to B). This reinforces the importance to carefully and strategically think through the decision, as its impacts do not seem to be irrelevant regarding associated changes in attractiveness.\n\nMoreover, every intervention that targeted A, or originated from E or G, impacted attractiveness negatively.\nAlso, although changing from C to B does not change the project type of license in terms of the restrictions analyzed in this research, it does impact attractiveness, suggesting that stakeholders prefer AFL to MIT, for instance, which makes sense as AFL was designed to improve MIT and that was the reason to include it separately in this study. However, the actual reasons for such finding should be an object of future research, as it suggests there is more to the licensing scheme as this quantitative research captures.\n\nFinally, going from G to B led to a reduction of 15% on attractiveness. The dual-license option that G represents signals to projects\u2019 stakeholders that the software is suitable for a wider audience as this intellectual property model can accommodate the interests of various groups, being more market flexible (a generic strategy). Moving away from this management model appears to push attractiveness down, always, as mentioned before (a focused strategy).\n\n6 Results and Findings: the asymmetry of effects and the statistical answer to RQ2\n\nThe lack of symmetry of effect is interesting and deserves further consideration. None of the types of licensing schemas analyzed in this research escapes from this. All the licensing schemas have asymmetric effects with at least one other type of license. The most contradictory type of license is B, which has symmetric effects only with E and G. The least contradictory scheme is A, having the opposite effect on attractiveness only when B is involved (see the superscript letters in Table 5). This finding suggests that a match between licensing scheme and projects\u2019 specific stakeholders might exist, or the direction of the effect of a given license would simply be reversed depending on whether it is the source or the destiny of the intervention. The suitability of one license schema is likely to rely on the context of its adoption, that is, on the momentary demands of stakeholders, and thus no combination of license should be treated as ideal in general, but only in specific according to stakeholders\u2019 expectations on a project-by-project basis.\n\nNow towards the statistically based answer to RQ2, the results here reported were further analyzed. The reorganized dataset with mean month-projects attractiveness per licensing schema was subjected to analysis (see Table 4 for descriptive statistics). But, before getting into the mean difference comparisons (t-tests), the values for mean attractiveness for all the time were considered. These results taken together signal that less restrictive licenses are more attractive on average, as dual license beats the academic unrestricted schema (e.g., MIT), which in turn is more attractive than the GPL highly restrictive choice. The conclusion is that the project attractiveness varies according to license schema consistently. Of course, this analysis is basic in statistical terms, but what is clear is that variations on attractiveness indicators are associated with the licensing schema chosen by the FSP manager. The t-tests performed below give further confidence on the answer of RQ2.\n\nAs explained before, for the mean statistical comparisons, the monthly data was aggregated to increase sample size, as explained before, and the mean differences between each pair of licensing schema was calculated, along with the standard deviation of these differences and subsequent confidence intervals for statistical significance determination. The results are presented in Table 6 below, which considers if the mean difference is significant at 0.05 type I error with the Bonferroni correction procedure applied (marked with *), and the effect size of each pair of licensing schema based on Cohen\u2019s $D^*$ (marked with $^*$).\n\nAccording to the results shown in Table 6, one can see that 11 out of 21 pairs are statistically significantly different, using the most conventional statistical procedure to control for inflated alpha in the context of multiple comparisons (Bonferroni). Out of this 11, 4 have effect sizes between small and medium but significant according to Cohen\u2019s D famous suggested interpretations (higher than 0.2). This signals that the licensing schema is indeed associated with the average numbers of web hits, downloads and members a project can attract. These differences in absolute numbers and effect sizes between schemas peaks at the C-G pair, with a $-1.35$ mean difference in favor of the dual license schema when a project moves away from the AFL license option. The rest of the results for each pair of licensing schemas can be found in Table 6.\n\nOverall, these statistical results and analysis on the variations of attractiveness taken together allow for a solid answer to the second research question posed here in this paper of whether an intellectual property intervention (a managerial change in licensing schema). The licensing schema is indeed associated with variations attractiveness level, not in all, but in many cases, having a meaningful effect size in a few of them. In the next section, the general conclusions are discussed based on the answers found for both research questions, presenting directions for future research and guidelines for free and open source software managers.\n\n7 Conclusions: implications to research and practice\n\nThis research focused on intellectual property rights interventions in free and open source software projects (FSP), on licensing schema changes that regulate the distribution allowances of the software source code under the hypothesis that such managerial interventions would affect stakeholders\u2019 perceptions of value and thus variations of FSP attractiveness before and after the managerial intervention could be observed.\nTo validate such theoretical expectation, data on thousands of FSP over almost 4 years was filtered to identify a sample of 756 projects that changed their types of licenses, allowing then an empirical study of the various managerial interventions detected in a period of 44 months. These variations were cataloged and organized to allow for comparisons of attractiveness changes grouped by the intervention type, a finding so far missing from the free software literature. Moreover, further reorganization of these original datasets allowed the comparisons of projects\u2019 attractiveness to verify whether the licensing schema adopted by FSP managers were associated with the project performance concerning attraction of developers, users and visitors, represented by a linear combination of the numbers of members, downloads and web hits. The classification schema for the licenses adopted by FSP managers developed in this paper also represents a step forward in the literature, as up to now the reality of the adoption of various licenses with apparent contradictory allowances to the source code (with GPL and a public domain license, for example) was not captured in previous research. The result is a more complex but accurate classification, with of course pros and cons.\n\nAs for a general conclusion, the results indicate that the legal terms specified in the license are indeed associated with project attractiveness, as an aggregated measure. This is in line with previous research, which led to the expectation that the various business models possible with open source, expressed through their licensing schemas, are related to their success regarding the attraction of users and developers [10, 12, 26]. However, moving beyond the previously published literature, the findings suggest the specifics of such generic hypothesis are not well understood yet.\n\nIt has been found that changes in the software rights of distribution, to be fully understood, cannot be treated solely generically, as interventions vary in attractiveness variations associated with them, being beneficial or not depending on much more than what is known from published literature on free software. This research is the first to point that out, providing thus ground for future (case/qualitative) studies to follow the lead and explore the specific reasons for the license intervention and the consequent increase or reduction in attractiveness based on stakeholders\u2019 perceptions. Both projects\n\n### Table 6: Statistical tests for attractiveness mean differences\n\n| one MINUS another licensing schema | Paired differences of attractiveness | 99% confidence interval | t | df | p-value | Cohen\u2019s D |\n|-----------------------------------|-------------------------------------|------------------------|---|----|---------|-----------|\n| Mean | Std. Deviation | Std. Error Mean | Lower | Upper |\n| Pair 1 | A \u2013 B | 0.22 | 3.97 | 0.10 | -0.02 | 0.47 | 2.33 | 1735 | 0.02 | 0.06 |\n| Pair 2 | A - C* | 0.90 | 3.87 | 0.27 | 0.19 | 1.61 | 3.28 | 200 | 0.00 | 0.23 |\n| Pair 3 | A \u2013 D | 0.04 | 3.74 | 0.09 | -0.19 | 0.27 | 0.50 | 1751 | 0.62 | 0.01 |\n| Pair 4 | A \u2013 E | -0.31 | 4.08 | 0.14 | -0.66 | 0.04 | -2.26 | 895 | 0.02 | -0.08 |\n| Pair 5 | A \u2013 F | 0.03 | 3.91 | 0.09 | -0.21 | 0.27 | 0.34 | 1749 | 0.74 | 0.01 |\n| Pair 6 | A - G* | -0.67 | 4.22 | 0.11 | -0.95 | -0.39 | -6.24 | 1553 | 0.00 | -0.16 |\n| Pair 7 | B \u2013 C | 0.39 | 3.44 | 0.25 | -0.24 | 1.03 | 1.61 | 195 | 0.11 | 0.11 |\n| Pair 8 | B - D* | -0.38 | 3.57 | 0.05 | -0.51 | -0.24 | -6.95 | 4356 | 0.00 | -0.11 |\n| Pair 9 | B - E* | -0.57 | 3.84 | 0.13 | -0.91 | -0.22 | -4.26 | 829 | 0.00 | -0.15 |\n| Pair 10 | B - F* | -0.32 | 3.95 | 0.06 | -0.48 | -0.17 | -5.44 | 4419 | 0.00 | -0.08 |\n| Pair 11 | B - G* | -0.90 | 4.19 | 0.11 | -1.18 | -0.62 | -8.22 | 1458 | 0.00 | -0.22 |\n| Pair 12 | C \u2013 D | -0.06 | 3.44 | 0.25 | -0.72 | 0.59 | -0.26 | 186 | 0.80 | -0.02 |\n| Pair 13 | C - E* | -0.90 | 3.60 | 0.24 | -1.53 | -0.28 | -3.77 | 224 | 0.00 | -0.25 |\n| Pair 14 | C \u2013 F | 0.09 | 3.23 | 0.23 | -0.50 | 0.69 | 0.40 | 197 | 0.69 | 0.03 |\n| Pair 15 | C - G* | -1.35 | 3.79 | 0.25 | -2.01 | -0.68 | -5.28 | 220 | 0.00 | -0.36 |\n| Pair 16 | D - E* | -0.60 | 3.67 | 0.13 | -0.93 | -0.27 | -4.70 | 817 | 0.00 | -0.16 |\n| Pair 17 | D \u2013 F | 0.07 | 3.66 | 0.05 | -0.07 | 0.21 | 1.28 | 4614 | 0.20 | 0.02 |\n| Pair 18 | D - G* | -0.84 | 3.95 | 0.10 | -1.10 | -0.57 | -8.17 | 1490 | 0.00 | -0.21 |\n| Pair 19 | E - F* | 0.65 | 3.85 | 0.13 | 0.31 | 0.99 | 4.88 | 836 | 0.00 | 0.17 |\n| Pair 20 | E \u2013 G | -0.31 | 4.07 | 0.13 | -0.65 | 0.04 | -2.29 | 927 | 0.02 | -0.08 |\n| Pair 21 | F - G* | -0.77 | 4.16 | 0.11 | -1.05 | -0.50 | -7.20 | 1504 | 0.00 | -0.19 |\n\n*indicates significance at 0.05 with Bonferroni correction (<0.0023 = 0.05/21)\nCohen\u2019s D calculated as mean divided by std. deviation. Superscript letter A means effect size between small and medium\nSource: Authors own\nmanagers and stakeholders\u2019 perceptions should be considered in these future research endeavors.\n\nThis future line of inquiry based on case/qualitative studies would be able to shed light on the asymmetric effects detected in the sample as well. Quite often an intervention from one license to another did not have an opposite effect when a change from another to one was analyzed (a vice-versa comparison is not possible). Probably, FSP stakeholders have expectations related to an occasional change that might occur in the license terms of the free software they have the intention to adopt or contribute. This means that depending on the current license (the anchor), the effects of changing to one same license might be different; and that the specific interests of project stakeholders also matter (e.g., hardware production or service sale). Managers should take that into account when considering a license change.\n\nFSP managers should be aware that the success of their projects is linked with their choice of license, as fewer market resources \u2013 the attention of users and the labor of developers \u2013 might flow in their direction depending on that. This means that managers must understand who are the relevant stakeholders of their application, what they want out of the software source code, and attempt to meet their expectations, carefully considering a change in the licensing only through a direct negotiation with these stakeholders to avoid unwanted consequences. This research indicates that there is no silver bullet concerning right licensing schema, or business model, signaling the general hypothesis here explored needs further elaboration.\n\nAcademically speaking, a contingent type of theory to explain the license schema impacts on attractiveness based on context, perhaps stakeholder-based, needs to be developed. To help guide future researchers in that direction, at this moment, it is possible to highlight that a general strategy (multiple licenses) appears to be superior to the specific license schema, as it perhaps accommodates stakeholders\u2019 conflicting interests better. This would explain the noticeable trend to adopt the \u201cvarious licenses\u201d strategy, and demonstrates how important it is to improve the classification schema previously adopted in the literature.\n\nIn conclusion, intellectual property interventions are not always beneficial for a free software project, but almost invariably are associated with attractiveness variations. Accordingly, FSP managers should be aware of the importance to carefully select and change the type of license for FSP to (continuously) succeed as a result of a growing market interest in the application and its source code. Nevertheless, such intervention decision should not occur unaware of the specific project under consideration and its stakeholders\u2019 intentions with the software in the future.\n\nNevertheless, methodologically speaking, future research must persist in pursuing the license-attractiveness relationship, analyzing this longitudinal type of data with more advanced inferential statistical techniques, such as structural equation modeling, to explore and understand the causal relationships better and even more rigorously. The t-tests with the Bonferroni procedure applied here is a basic and reliable choice for the problem at hand, but analytical improvements are possible and welcome for a collective, scientific communication towards knowledge accumulation. Another downside of this research is its sample, which was restricted to Sourceforge.com projects. Nowadays there are many other free software repositories that could be considered. Nevertheless, the findings here reported are likely to be constant across these repositories, a hypothesis that future research can verify as well.\n\nFinally, the measures of attractiveness here adopted are another point of improvement to be performed by future research. Only number of web hits, downloads and members were utilized, but other various measures are possible. For example, one could use market share as an alternative, or survey methods, to evaluate attractiveness subjectively. Moreover, attractiveness is probably the consequence of many things besides the license chosen by the project manager, and so other factors should be considered in future research. In this paper, this endogeneity issue was dealt with a sampling procedure that identified projects of various kinds and level of maturity, thereby controlling for some of those effects. Additionally, the results here discussed appear complex but seem to be a more accurate representation of FSP reality. As such, they are in themselves not fully understood, and so future research should use the same dataset, made available along with this paper, with different analytical and theoretical approaches to shed more light on these projects behaviour over time.\n\n8 Endnotes\n1http://www.gnu.org/gnu/initial-announcement.en.html\n2http://www.nber.org/papers/w9363\n3http://dl.acm.org/citation.cfm?id=2597116\n4http://flossmole.org/\n5http://www3.nd.edu/~oss/Data/data.html\n6http://thestatsgeek.com/2013/09/28/the-t-test-and-robustness-to-non-normality/\n7http://nrs.harvard.edu/urn-3:HUL.InstRepos:11718205\n\n9 Additional file\n\nAdditional file 1: Dataset with the raw data used in the research. (CSV 1489 kb)\n\nAbbreviations\nAFL: Academic Free License; FSP: Free and open source software projects; GPL: General Public License; IPI: Intellectual property interventions; MIT: Massachusetts Institute of Technology (the license)\nAcknowledgements\nI appreciate the comments and guidance provided by Professors Julio Singer (statistics, USP) and Fabio Kon (computer science, USP). Their contributions on initial stages of this research were incredibly helpful. I also thank the Center for Technology Development (CDT) of the University of Brasilia (UnB) for the technical help provided in the work of Raphael Saigg. A previous version of this paper was presented at CSCW 2011.\n\nFunding\nI thank FAPESP (2009/02046-2) for funding.\n\nAuthors\u2019 contributions\nI am the sole author.\n\nEthics approval and consent to participate\nNo need. Only secondary and public data used.\n\nCompeting interests\nThe authors declare that they have no competing interests.\n\nPublisher\u2019s Note\nSpringer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.\n\nReceived: 2 March 2016 Accepted: 12 July 2017\nPublished online: 07 August 2017\n\nReferences\n1. McCafferty D. Should code be released? Commun ACM. 2010;53:10.\n2. Stone R. Earth-Observation Summit Endorses Global Data Sharing. Science. 2010;330:6006.\n3. Sojer M, Henkel J. Code reuse in open source software development: Quantitative evidence, drivers, and impediments. J Assoc Inf Syst. 2010;11(12):868\u2013901.\n4. Allen RC. Collective invention. J Econ Behav Organ. 1983;4:1\u201324.\n5. von Hippel E. Cooperation between rivals: Informal know-how trading. Res Policy. 1987;16:291\u2013302.\n6. Colazo J, Fang Y. Impact of license choice on Open Source Software development activity. J of the Am Society for Inf Sci Tech. 2009;60:5.\n7. Rosen, L. Open Source Licensing: Software Freedom and Intellectual Property Law. Prentice Hall; 2004.\n8. Stewart KJ, et al. Impacts of License Choice and Organizational Sponsorship on User Interest and Development Activity in Open Source Software Projects. Inf Syst Res. 2006;17:2.\n9. Raymond, Eric S. The Cathedral & the Bazaar: Musings on linux and open source by an accidental revolutionary. O\u2019Reilly; 2001.\n10. Fitzgerald, Brian. The Transformation of Open Source Software, MIS Quarterly (30: 3). 2006.\n11. Agerfalk P, Fitzgerald B. Outsourcing to an unknown workforce, Exploring Opensourcing as a global sourcing strategy. MIS Q. 2008;32:2.\n12. Santos C, Kuk G, Kon F, Pearson J. The attraction of contributors in free and open source software projects. J Strateg Inf Syst. 2013;22(1):26\u201345.\n13. Maillart, T., Sornette, D., Spaeth, S., von Krogh, G. Empirical tests of Zipf\u2019s law mechanism in open source Linux distribution. Phys Rev Lett. 2008;101.\n14. Wiggins, A., Howison, J., Crowston, K. Heartbeat: measuring active user base and potential user interest in FLOSS projects. In: Proceedings of the Fifth International Conference on Open Source Systems (OSS). 2009. p. 94\u2013104.\n15. Crowston K, Howison J, Annabi H. Information systems success in Free and Open Source Software development: Theory and measures. Soft Proc Improv Pract. 2006;11(2):123\u201348.\n16. Vendome C, Linares-V\u00e1squez M, Bavota G, Di Penta M, Daniel M, German DM, Poshyvanyk D. 2015. When and why developers adopt and change software licenses. In Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME) (ICSME \u201915). IEEE Computer Society, Washington, DC, USA, 31-40. http://dx.doi.org/10.1109/ICSM.2015.7332449.\n17. Stewart K, Gosain S. The impact of ideology on effectiveness in open source software development teams. MIS Q. 2006;30(2):291\u2013314.\n18. Sen, R. et al. Determinants of the Choice of Open Source Software License. JMIS. 25;(2008).\n19. Lerner J, Tirole J. The Scope of Open Source Licensing. J Law Econ Org. 2005;21:1.\n20. Raymond E. The cathedral and the bazaar. Knowledge Technol Policy. 1999;12(3):23\u201349.\n21. Sing P, Phelps C. Networks, social influence, and the choice among competing innovations: Insights from open source software licenses. Inf Syst Res. 2009;24(3):539\u201360.\n22. Y. Wu, Y. Manabe, T. Kanda, D. M. German, and K Inoue. A method to detect license inconsistencies in large-scale open source projects. In The 12th Working Conference on Mining Software Repositories MSR 2015, Florence, Italy, May 16-17, 2015. IEEE; 2015.\n23. Howison J, Conklin M, Crowston K. FLOSSmole: A collaborative repository for FLOSS research data and analyses. Int J Inform Technol Web Engr. 2006;1(3):17\u201326.\n24. Mardia, K. et al. Multivariate Analysis (Probability and Mathematical Stats). Academic Press; 1980.\n25. Hannan M, Freeman J. Structural Inertia and Organizational Change. Am Sociol Rev. 1984;49(2):149\u201364. Retrieved from http://www.jstor.org/stable/2095567?seq=1#page_scan_tab_contents.\n26. Watson RT, et al. The business of open source. Commun ACM. 2008;51(4):41\u20136.", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/015-santos.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 12, "total-input-tokens": 40523, "total-output-tokens": 15721, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 4242, 1], [4242, 10036, 2], [10036, 15982, 3], [15982, 20333, 4], [20333, 25561, 5], [25561, 31009, 6], [31009, 36470, 7], [36470, 42372, 8], [42372, 48110, 9], [48110, 53008, 10], [53008, 58535, 11], [58535, 63201, 12]]}}
|
|
{"id": "a8813d7c19aef18cee8f69ede52c65d5bbf76b6a", "text": "Code Reuse in Open Source Software Development: \nQuantitative Evidence, Drivers, and Impediments\n\nMarch 2010\nManuel Sojer\u00b9 and Joachim Henkel\u00b9, \u00b2\n\n\u00b9Technische Universit\u00e4t M\u00fcnchen, Sch\u00f6ller Chair in Technology and Innovation Management, Arcisstr. 21, D-80333 Munich, Germany. sojer|henkel@wi.tum.de\n\u00b2Center for Economic Policy Research (CEPR), London\n\nAbstract\n\nThe focus of existing open source software (OSS) research has been on how and why individuals and firms add to the commons of public OSS code\u2014that is, on the \u201cgiving\u201d side of this open innovation process. In contrast, research on the corresponding \u201creceiving\u201d side of the innovation process is scarce. We address this gap, studying how existing OSS code is reused and serves as an input to further OSS development. Our findings are based on a survey with 686 responses from OSS developers. As the most interesting results, our multivariate analyses of developers\u2019 code reuse behavior point out that developers with larger personal networks within the OSS community and those who have experience in a greater number of OSS projects reuse more, presumably because both network size and a broad project experience facilitate local search for reusable artifacts. Moreover, we find that a development paradigm that calls for releasing an initial functioning version of the software early\u2014as the \u201ccredible promise\u201d in OSS\u2014leads to increased reuse. Finally, we identify developers\u2019 interest to tackle difficult technical challenges as detrimental to efficient reuse-based innovation. Beyond OSS, we discuss the relevance of our findings for companies developing software and for the receiving side of open innovation processes in general.\n\nKeywords: Innovation, software development, open source software, code reuse, software reuse\n\nWe are grateful to Oliver Alexy, Timo Fischer, Stefan Haefliger, Francesco Rullani, and seminar participants at the Pre-ECIS 2009 Open Source and Innovation Workshop, the TUM/Imperial Paper Development Workshop 2009, and the Open Source, Innovation, and Entrepreneurship Workshop 2010 for helpful comments.\n1. Introduction\n\nThe public development of open source software (OSS)\\(^1\\) is a specific instance of open innovation, a term coined by Chesbrough (2003). A large body of empirical work has addressed the \u201cgiving\u201d side of this open innovation process, that is, exploring the question of why and how individuals (e.g. Ghosh et al., 2002; Hars and Ou, 2002; Hertel et al., 2003; Lakhani and Wolf, 2005; Henkel, 2009) and firms (e.g. West, 2003; Dahlander, 2005; Gruber and Henkel, 2005; Bonaccorsi et al., 2006; Henkel, 2006; Rossi Lamastra, 2009) make their developments freely available for others to use and build upon.\n\nIn contrast, research on the \u201creceiving\u201d side of the innovation process,\\(^2\\) that is, on the extent, drivers, and impediments of reuse of existing OSS code in subsequent OSS development, is scarce and either based on high-level code or dependency analyses (German, 2007; Mockus, 2007; Spaeth et al., 2007; Chang and Mockus, 2008), or on case studies (von Krogh et al., 2005; Haefliger et al., 2008). While this research suggests that code reuse is of major importance for OSS development, a large-scale quantitative study of the phenomenon on the level of individual developers is lacking.\n\nA better understanding of code reuse in OSS is desirable, not only in itself, but also because it will yield insights on reuse beyond OSS. Reuse has long been recognized as crucial to overcome the \u201csoftware crisis\u201d (Naur and Randell, 1968), as it allows for more efficient and more effective development of software of higher quality (Krueger, 1992; Kim and Stohr, 1998). More generally, the literature on innovation management points to knowledge reuse as an important factor mitigating the cost of innovation (e.g. Langlois, 1999; Majchrak et al., 2004). Despite significant advances in reuse research, especially software reuse in commercial firms is still not without issues and its antecedents are not fully understood yet (e.g. Desouza et al., 2006; Sherif et al., 2006). Some scholars suspect that reuse failure is often related to individual developer issues (e.g. Isoda, 1995; Morisio et al., 2002). However,\n\n---\n\n\\(^1\\) For better readability, we will use the term Open Source software in this article, but our work also refers to Libre and Free software, which differs from open source in ideological considerations but not in technical ones. See http://www.gnu.org/philosophy/free-sw.html for further information.\n\n\\(^2\\) Also the users of OSS obviously receive code, however, since they do not base own innovations on it we do not consider them to be on the \u201creceiving\u201d side of the OSS innovation process.\nthere is a paucity of, especially quantitative, research addressing the view of individual developers on reuse (e.g. Sen, 1997; Ye and Fischer, 2005).\n\nOur aim is to fill the above gap regarding the \u201creceiving\u201d side of OSS innovation and to leverage our findings to augment general software reuse literature by adding insights regarding the perspectives of individual developers on reuse with a survey-based empirical study of code reuse in public OSS development. We quantitatively assess the importance of code reuse as one form of reuse in software development in OSS, and explore its drivers and impediments at the level of individual developers. Our empirical approach relies on a web-based survey to which we had, via email, invited 7,500 developers from SourceForge.net, the largest OSS development platform.\n\nOur results point out that code reuse does play a major role in OSS development; developers reported, on average, that 30 percent of the functionality they have implemented in their current main projects has been based on reused code. Investigating the drivers of reuse in multivariate analyses, we find that developers who believe in the effectiveness, efficiency, and quality benefits of reuse and developers who see reuse as a means to work on their preferred development tasks rely more on existing code. Further, presumably because both a larger network and experience in a greater number of projects provide them with access to local search for reusable artifacts, developers with larger personal networks within the OSS community and experience in a greater number of OSS projects reuse more. Moreover, we find that a development paradigm that calls for releasing an initial functioning version of the product early, and so delivering a \u201ccredible promise\u201d, leads to increased reuse. Finally, developers\u2019 interest to tackle difficult technical challenges is identified as detrimental to efficient reuse-based innovation, while developers\u2019 commitment to the OSS community leads to increased reuse behavior.\n\nThe remainder of the paper is organized as follows. The next section reviews relevant literature on software reuse and OSS, followed by a section that presents our research model and hypotheses. After that, we elaborate on our data and measures before we present our analyses and results. The last section concludes with a summary and a discussion. A supplemental appendix contains further tables referred to in the paper but not included in its main body for space considerations.\n2. Literature Review\n\nThe theoretical foundation of this paper draws on two streams of the literature. First, we review relevant software engineering literature on reuse and its implementation in firms. Second, scholarly work on OSS development provides the context of our work, establishing basic concepts of why developers contribute to OSS projects and how they do so. A summary of the small base of scholarly work on code reuse in OSS development concludes the literature review.\n\n2.1. Reuse in Software Development\n\nSoftware reuse (as the software-specific form of knowledge reuse (e.g. Langlois, 1999; Majchrak et al., 2004)) is \u201c[\u2026] the process of creating software systems from existing software rather than building software systems from scratch\u201d (Krueger, 1992, p. 131). The artifacts most commonly reused in software development are components (pieces of software that encapsulate functionality and have been developed specifically for the purpose of being reused) and snippets (multiple lines of code from existing systems) (Krueger, 1992; Kim and Stohr, 1998). Our study focuses on these two artifacts, and we refer to their reuse as \u201ccode reuse.\u201d Software reuse promises not only increased development efficiency and reduced development times, but also improved software quality and better maintainability because developers do not have to develop everything from scratch, but rather can rely on existing, proven, and thoroughly tested artifacts (Frakes and Kang, 2005).\n\nDespite these compelling benefits, software reuse still fails frequently in commercial firms, sometimes for technical, but most often for human and organizational reasons (e.g. Morisio et al., 2002). The importance of the individual developer in successful reuse is undisputed. Isoda (1995, p. 183) for instance concedes, \u201cunless they [software engineers] find their own benefits from applying software reuse [\u2026] they will not [\u2026] perform reuse.\u201d Still, there is a paucity in reuse research that focuses on the individual developer (Sen, 1997; Ye and Fischer, 2005).\n\nOSS seems to be a unique opportunity to enhance our knowledge about the role of individuals in successful reuse-based innovation and software reuse in particular for two reasons. First, contrary to commercial software developers who\nare often restricted to the limited amount of code available in their firms\u2019 reuse repositories, the abundance of OSS code available under licenses which generally permit reuse in other OSS projects provides OSS developers with broad options to reuse existing code if they wish to do so. Second, the broad scholarly knowledge about the motivations and beliefs of OSS developers should be helpful in analyzing the perspectives of individual developers on software reuse. The next section establishes community-based, public OSS development as the empirical setting of our analysis.\n\n2.2. Open Source Software Development\n\nStrictly speaking, software is OSS if it comes under an open source license. Such a license grants users of the software the right to access, inspect, and modify the source code of the software and distribute modified or unmodified versions of it.\\(^3\\) Since much OSS is developed by informal collaboration in public OSS projects (Crowston and Scozzi, 2008), the term \u201cOSS\u201d is often also understood to imply that the software has been developed in the \u201cOSS fashion\u201d (von Krogh et al., 2008). Typically, the development of software in OSS projects differs strongly from the development of traditional software in most commercial setups (Crowston et al., 2009). In this context the motivation of developers to spend considerable time on their OSS projects and the process of OSS development are of particular relevance to our study.\n\nA large body of literature has emerged that addresses the first topic. Common to most of this work is the finding that OSS developers work on their projects for both intrinsic and extrinsic reasons. As intrinsic motivations, scholars have identified identification with the OSS community and the resulting wish to support it (Hertel et al., 2003), ideological support of the OSS movement (Stewart and Gosain, 2006), the desire to help others (Hars and Ou, 2002), and, most importantly, the fun and enjoyment that developers experience when working on their projects (Lakhani and Wolf, 2005). Based on psychology research (Amabile et al., 1994), Sen et al. (2008) further differentiate fun into the enjoyment and \u201cflow\u201d feelings (Cs\u00edkszentmih\u00e1lyi, 1990) that developers perceive when writing code and the satisfaction of solving challenging technical problems. Extrinsic motivations of\n\n\\(^3\\) Whether a software license is an open source license is determined by the Open Source Initiative (http://www.opensource.org).\nOSS developers may derive from the wish to enhance their reputation in the OSS community (Lakhani and Wolf, 2005), to hone their software development skills (Hars and Ou, 2002), to develop or adapt software functionality to their own needs (Hertel et al., 2003), and to signal their skills to potential employers and business partners (Lerner and Tirole, 2002). Also, they may be paid directly for their OSS work, for example, if it is part of their job (Ghosh et al., 2002).\n\nRegarding the process of OSS development, OSS projects are often started by an individual developer who has a need for certain software functionality that does not yet exist (Raymond, 2001). After initialization, the developer typically wants to attract other developers to participate in the project. An incentive for others to join the project is that it offers interesting tasks and also seems feasible (von Krogh et al., 2003). The founder can enhance this recruitment process by delivering a \u201ccredible promise\u201d, which Lerner and Tirole (2002, p. 220) describe as \u201ca critical mass of code to which the programming community can react. Enough work must be done to show that the project is doable and has merit.\u201d However, not only does the founder have to prove that the project is worthy of support by others, but also developers interested in joining a project often have to show that they possess the skills required by solving some of the technical issues the project is currently facing (von Krogh et al., 2003).\n\n2.3. Code Reuse in Open Source Software Development\n\nThere is scant research on code reuse in OSS and so far no large-scale quantitative data on the developer level exist. Initial academic work, however, suggests that code reuse is practiced in OSS projects even at a high level. Analyzing the code of a large number of OSS projects, Mockus (2007) and Chang and Mockus (2008) measure the overlap of filenames among OSS projects in their database of 38.7 thousand OSS projects and conclude that about 50 percent of the components exist in more than one project. Mockus\u2019s (2007) data even suggests that code reuse is more popular in OSS development than in the traditional commercial closed source software arena. Following a different approach, both German (2007) and Spaeth et al. (2007) rely on dependency information available in Linux distributions to show that most packages in these distributions require other packages as they reuse their functionality.\nUsing case studies on the project and individual developer level rather than large-scale code analyses, von Krogh et al. (2005) and Haefliger et al. (2008) confirm that OSS developers reuse existing code\u2014in the form of components and snippets\u2014as well as abstract knowledge\u2014such as algorithms and methods. Diving into the mechanics of code reuse in OSS, Haefliger et al. (2008) find that OSS developers reuse code because they want to make their development work more efficient, they lack the skills to implement certain functionality by themselves, they prefer some specific development work over other tasks, or they want to deliver a \u201ccredible promise\u201d with their project. The authors further point out that there exist equivalents to some of the components of corporate reuse programs, such as the OSS repositories like SourceForge.net which can substitute internal reuse repositories within firms, or the reuse frequency of a component which can serve as a proxy for the component\u2019s quality and thus substitutes certification.\n\n3. Research Questions and Hypotheses\n\nBuilding on the existing research on code reuse in OSS presented above, this paper seeks to use large-scale quantitative data obtained through a survey among OSS developers to answer the question: under what conditions do developers prefer reusing existing code over developing their own code from scratch? In this context, the following specific research questions will be addressed:\n\n1. How important is code reuse in OSS development projects?\n2. What do OSS developers perceive as the benefits of code reuse and what do they see as the issues and impediments?\n3. How is the degree of code reuse in open source developers\u2019 work determined by their characteristics and those of their project?\n\nThe first question establishes if and to what extent OSS developers reuse existing code, while the subsequent questions explore how this behavior can be understood and explained. Question three will be addressed using regression analyses. To guide the choice of explanatory variables and formulate hypotheses, a research model is developed in the following section. To provide a solid theoretical base, our research model builds on the well-established Theory of Planned Behavior (TPB) (Ajzen, 1991) and is refined and extended with both interviews and literature on code reuse and OSS.\n3.1. Theory of Planned Behavior\n\nInitially developed in the context of social psychology, TPB as a behavioral model has found wide adoption in various fields of information systems (IS) research. TPB is a parsimonious and rather generic model explaining human behavior and thus provides an excellent starting point to investigate code reuse as one particular form of behavior. Research related to the topic of our study has relied on TPB or its sister model TAM (Technology Acceptance Model) (Davis et al., 1989) to explain for example software developers\u2019 application of various development methodologies such as CASE tools (Riemenschneider and Hardgrave, 2001), object-oriented software development (Hardgrave and Johnson, 2003) or generally formalized software development processes (Riemenschneider et al., 2002; Hardgrave et al., 2003). Following the encouraging results of this stream of research we base our research model on TPB.\n\nTPB posits that behavior is determined by intention, which itself is predicted by three factors: (1) attitude toward the behavior, (2) subjective norms, and (3) perceived behavioral control. Attitude is formed by the individual\u2019s beliefs about the consequences and outcomes (both positive and negative) of the behavior. Subjective norms refer to pressure from the social environment as perceived by the individual to perform or not perform the behavior. Lastly, perceived behavioral control is the perception of individuals of their ability to perform the behavior. It can be further broken down in individuals\u2019 \u201ccapability\u201d of performing the behavior and the \u201ccontrollability\u201d (Ajzen, 2002) the individuals have over the behavior, that is, whether the decision to perform the behavior is theirs or not.\n\n3.2. Research Model and Hypotheses\n\nUsing TPB as a starting point for our research model (see Figure 1), we argue that developers\u2019 reuse behavior is influenced by their attitude toward code reuse, their subjective norms on code reuse, and the behavioral control they perceive regarding code reuse. Contrary to typical work relying on TPB, we do not employ generic scales to measure these constructs in most cases, but rather operationalize them with unique scales and single items explicitly framed in the OSS and code reuse context. As a second deviation from typical TPB research we will test the research model with different regressions which either use intention to reuse as the dependent variable or employ actual reuse behavior as the dependent\nvariable. Since we do not combine intention and behavior into one construct, but rather employ only one of them in each of our regression models we stay true to the TPB assumption that the two concepts are related but not the same. Comparing the results of the regressions with different dependent variables adds robustness to our findings.\n\nNote that our research model aims at explaining developers\u2019 reuse behavior without explicitly differentiating between component and snippet reuse. In conventional software development component reuse is typically considered as black-box reuse, implying that developers can neither access nor modify the source code of the components the reuse. Thus, component reuse is assumed to follow different drivers than white-box reuse (e.g. snippet reuse) where access to source code is given (Ravichandran and Rothenberger, 2003). In the context of OSS however, also the source code of components is available to reusing developers and our survey data indicate that about 50% of the developers exercise the option to modify it. Because of this we expect no fundamental differences in the drivers of\ncomponent and snippet reuse and treat both forms of code reuse jointly in our research model.\n\nBased on our interviews\\(^4\\) and existing research, we have identified five main drivers that influence developers\u2019 attitude toward code reuse since they determine whether developers expect positive or negative outcomes from reuse. These drivers are developers\u2019 perceptions of (1) the effectiveness of reuse, (2) the efficiency of reuse, (3) the software quality attained by reuse, (4) the task selection benefits resulting from reuse, and (5) the potential loss of control over their project that might come with reuse. The link between reuse and effectiveness, efficiency, and software quality is straightforward. In addition, code reuse might result in task selection benefits if developers can avoid certain tasks by reusing existing code (Haefliger et al., 2008). As the fifth driver, reuse can lead to control loss as a developer reusing code from another project might become dependent on this project to develop the code further, fix bugs, and so on. Since developers with a more positive perception of the above drivers should hold a more positive attitude toward reuse, TPB suggests that they rely more on reusing existing code in their work. Based on this logic, the following hypotheses can be derived for the five drivers:\n\n*Developers reuse more existing code\u2026*\n\n\\(H1a:\\) \u2026the more strongly they perceive the effectiveness benefits of reuse.\n\n\\(H1b:\\) \u2026the more strongly they perceive the efficiency benefits of reuse.\n\n\\(H1c:\\) \u2026the more strongly they perceive the quality benefits of reuse.\n\n\\(H1d:\\) \u2026the more strongly they perceive the task selection benefits of reuse.\n\n\\(H1e:\\) \u2026the less strongly they perceive the loss of control risks of code reuse.\n\nSince the primary interest of our research is to understand how individual developer characteristics influence reuse, both subjective norms and perceived behavioral control as the two other parts of TPB besides attitude are treated as control variables in our model. The controllability portion of perceived behavioral\n\n\\(^4\\) See the next section for an overview of our interviews.\ncontrol is operationalized by six variables relating to project attributes. Two dummy variables indicate whether there exist policies in the project supporting or discouraging code reuse. Four Likert-scale variables capture the intensity of general impediments to code reuse: a lack of reusable code for the specific requirements of a developer\u2019s project; conflicts between the license of the developer\u2019s project and the license of the code to be reused; incompatibilities between programming languages, when the code to be reused is written in a different language than the developer\u2019s project (Haefliger et al., 2008), or when the programming language of the focal project makes it difficult to include code in foreign languages; and an architecture of the developer\u2019s project that is not modular enough to allow for easy reuse of existing code (Baldwin and Clark, 2006). The capability portion of perceived behavioral control is operationalized through each developer\u2019s self-reported skill level in software development, arguing that without some proficiency, developers will not be able to understand and integrate foreign code.\n\nTPB research posits that attitude toward a behavior, subjective norms, and perceived behavioral control explain behavior comprehensively (Ajzen, 1991). We stay true to this assumption when we add further groups of hypotheses and control variables hereinafter because all of these additional groups could be incorporated into the three original TPB groups of attitude, subjective norms, and perceived behavioral control. We did, however, choose to display some hypotheses as independent groups to better illustrate the ideas behind them. Moreover, some further control variables are shown as a group of their own because their influence on attitude, subjective norms, and perceived behavioral control is rather indirect.\n\nIn the first additional hypotheses group, we argue that developers\u2019 access to local search leads to increased code reuse. Banker et al. (1993) show that developers will reuse if their costs for searching and integrating existing code are lower than for developing it from scratch. These costs for searching and integrating are lower if OSS developers can turn to their own experience or that of fellow OSS developers who can point them to the code they need, assure them of its quality, and explain to them how it works and how to best integrate it (Haefliger et al., 2008). Consequently, we posit that developers with a larger personal network of other OSS developers will reuse more code because they can reap the benefits of local search (H2a). Similarly, developers who have been active in more OSS projects in the past\nwill also show increased code reuse behavior \\((H2b)\\). Summarizing, the following two hypotheses can be derived regarding developers\u2019 access to local search.\n\n**Developers reuse more existing code\u2026**\n\n\\(H2a: \\) \u2026the larger their personal OSS network.\n\n\\(H2b: \\) \u2026the greater the number of OSS projects they have been involved in.\n\nFurther, we also conjecture a relationship between the maturity of an OSS project and the code reuse behavior of its developers. As pointed out in the literature review section, OSS developers launching a project strive to deliver a \u201ccredible promise\u201d as quickly as possible in order to attract other developers\u2019 support. Code reuse is an excellent tool to accomplish that because it allows the addition of large blocks of functionality to a new project with limited effort (Haefliger et al., 2008). Further, code reuse can help a new project to overcome its \u201cliabilities of smallness\u201d (Aldrich and Auster, 1986) and quickly close the gap to established competing projects in its domain. Lastly, while code reuse is very helpful in the early phases of the life of an OSS project, we expect its importance to decline once the project has reached a certain level of maturity. At that point, the project has implemented all required basic functionality and turns toward fine-tuning the aspects that make it unique, which by definition is difficult with reused code. Thus, we posit that the less mature an OSS project is, the more code its developers will reuse \\((H3)\\).\n\n\\(H3: \\) Developers reuse more existing code the less mature their project.\n\nIn the final group of hypotheses, we argue that the compatibility of code reuse with developers\u2019 own goals in their project will influence the extent of their code reuse behavior. This is important because the \u201cattitudes\u201d-group of our model presented above captures developers\u2019 general attitude toward code reuse, while the \u201ccompatibility\u201d-group presented in the following will help to link these general attitudes to the developers\u2019 work in one specific project. We follow Moore and Benbasat (1991, p. 195) and define compatibility as the degree to which code reuse \u201cis perceived as being consistent with the existing values, needs, and past experiences\u201d of an OSS developer and focus primarily on \u201cvalues\u201d and \u201cneeds\u201d\n(\u201cexperiences\u201d being addressed by $H2b$). Our argumentation regarding compatibility between developers\u2019 project goals and their reuse behavior is based on the motivations of developers to participate in OSS projects described earlier.\n\nSen et al. (2008) show empirically that developers for whom tackling difficult technical problems is a main motivation to work on their project try to limit the number of team members involved in their project besides themselves because they want to solve the problems themselves and without the help of others. In similar fashion, developers who work on their project to tackle difficult technical challenges should reuse less existing code because reuse would solve some of the challenges for them ($H4a$). In order to be able to focus on solving these difficult technical challenges by themselves, developers might very well show increased reuse behavior for other parts of their project, but we control for this effect by including developers\u2019 perception of task selection benefits through reuse (see $H1d$ above).\n\nAlso supportive of our argumentation is DiBona et al.\u2019s (1999, p. 13) description of the \u201csatisfaction of the ultimate intellectual exercise\u201d which developers feel \u201cafter completing or debugging a hideously tricky piece of recursive code that has been a source of trouble for days.\u201d It seems likely that reuse would reduce the joy described and thus developers for whom challenge seeking is a major motivation should reuse less existing code.\n\nRelated to the above effect of challenge seeking, reuse should also be of lower importance to developers who work on their project for the pleasure they experience when writing code ($H4b$). Code reuse would reduce their need to write own code and thus reduce the pleasure derived from doing so. Hars and Ou (2002, p. 28) provide a nice illustration for this argumentation when they quote an OSS developer explaining his motivation to work on his project with his \u201cinnate desire to code, and code, and code until the day I die.\u201d It seems more than plausible that a developer feeling this way about coding would, ceteris paribus, reuse less. As for challenge seeking, one might argue that developers who code for fun might reuse more in order to focus on the most enjoyable tasks. However, again, this is statistically controlled for by including developers\u2019 perception of task selection benefits through reuse (see $H1d$ above).\n\nThe goal to improve one\u2019s software development skills could affect reuse intensity in two directions. One could conjecture that developers who want to hone\ntheir skills purposefully reinvent the wheel in order to learn how it is done. Yet, we argue that countervailing effects dominate, such that developers for whom skill improvement is more important also reuse more existing code \\((H4c)\\). Our rationale is based on DiBona\u2019s (2005) finding that OSS developers leverage existing code as a starting point for their learning and study and modify it to improve their own skills. We also found confirmation for this stance in our interviews\\(^5\\) in which developers for example told us that they have \u201cused code reuse as a way of learning\u201d or pointed out that \u201creusing code snippets can really help to learn a new programming language.\u201d Also supportive to our argumentation is the finding from our survey\\(^6\\) that about 50% of the developers modify the components they reuse and thus do not practice black-box reuse in which they do not get in touch with the source code of the components.\n\nRegarding community commitment as motivation we argue that developers who feel strongly committed to the OSS community and want it to be successful will reuse more code \\((H4d)\\). Code reuse helps these developers to write better software faster, and allows them to make the community stronger by contributing this software.\n\nAs the last two motivations conjectured to influence developers\u2019 reuse behavior we turn to reputation building, first within the OSS community and second for the purpose of signaling skills to potential commercial partners such as employers. Regarding developers\u2019 reputation within the OSS community we argue that developers seeking to improve their reputation will reuse more code \\((H4e)\\). Code reuse should make a project better and thus create more attention for the project within the OSS community and also for the developers associated with the project. This argumentation receives support from Sen et al. (2008) who find that developers for whom OSS reputation building is important prefer to be part of a successful project with many other developers over being one of only a few developers of a less successful project. One could object that an OSS developer\u2019s reputation is grounded in her technical skills which she best proves with her unique\u2014that that is, not reuse-based\u2014contributions to the OSS community. Yet, this argumentation is refuted by von Krogh et al.\u2019s (2003) finding that developers\n\n\\(^5\\) See the next section for an overview of our interviews.\n\\(^6\\) The survey is introduced in detail in the next section.\nwho need to prove their worthiness to join a project by making their initial contributions often include reused code in these. Furthermore, Raymond\u2019s (2001, p. 24) famous saying that \u201cgood programmers know what to write. Great ones know what to rewrite (and reuse)\u201d also leans toward our hypothesis that developers for whom reputation building in the OSS community is important will reuse more existing code. Finally, and basically following the same argumentation as above, we posit that developers who want to signal their software development skills to potential employers or business partners will reuse more code because parties outside of the OSS community are more likely to become aware of successful OSS projects and their developers (H4f). Summarizing, we posit the following hypotheses addressing the compatibility between developers\u2019 motivations to work on their project and code reuse:\n\nDevelopers reuse more existing code\u2026\n\nH4a: \u2026the less important challenge seeking\u2026\n\nH4b: \u2026the less important coding fun and enjoyment\u2026\n\nH4c: \u2026the more important skill improvement\u2026\n\nH4d: \u2026the more important community commitment\u2026\n\nH4e: \u2026the more important OSS reputation building\u2026\n\nH4f: \u2026the more important commercial signaling\u2026\n\n\u2026is for them as a motivation to work on their project.\n\nFinally, multiple additional control variables are included in our model to account for further contextual differences in code reuse behavior. These control variables encompass four groups. First, we account for the project characteristics project size (number of project team members), technical complexity of the project, the project\u2019s position in software stack, and whether the project aims at creating a standalone executable application or a reusable component. In addition to that we further control for the level of professionalism and seriousness with which developers work on their current main project by including the number of years they\nhave already been involved in OSS, the average weekly hours they invest into their current main project, the share of functionality that was developed by them in their current main project as compared to their project team members, and whether they have worked or work as professional software developers. Moreover, we account for developers\u2019 education and training on reuse, which has been shown to be a determinant of reuse behavior in software development firms in previous research (e.g. Frakes and Fox, 1995). Finally, we accommodate developers\u2019 geographic residence on a continent level. Subramanyam and Xia (2008) have shown that developers from different geographies prefer, for example, different levels of modularity in their OSS projects. Following this line of thought, geographic origin might also be an antecedent for reuse behavior.\n\n4. Research Design, Data and Measures\n\nWe collected data for our study using a web-based survey that was developed based on 12 interviews with OSS developers and on the existing literature. Moreover, all questionnaire items and questions were assessed for clarity by fellow researchers and OSS developers in a qualitative pretest. In the survey, we asked developers about their experiences with code reuse in the context of their current main OSS project. In order to capture the high heterogeneity of OSS projects and their developers, we chose the largest OSS project repository, SourceForge.net, as a platform to selected survey participants. In April 2009, two rounds of quantitative pretests, to which in total 2,000 developers had been invited, were conducted to assess the quality of our questionnaire in terms of content, scope, and language. Following minor refinements based on an analysis of the pretest and feedback from the respondents, the main survey took place in July 2009. An email was sent to 7,500 developers from SourceForge.net inviting them to participate in our survey. The developers were selected at random from all SourceForge.net developers who had been active on the platform in the first half of\n\n---\n\n7 The number of years a developer has been active in OSS is treated as a control variable and not included in the local search hypotheses because it is not the intensity of experience (as e.g. measured by the number of years), but rather the breadth of experience (as e.g. measured by the number of projects involved) which is conjectured to facilitate better access to local search and consequently more code reuse.\n\n8 Ten of these interviews had been conducted via phone or Internet-based voice communication, the two others were conducted via email exchange. Nine of the voice-based interviews were taped and transcribed and had an average length of 49 minutes.\nWe received a total of 686 responses, equaling a response rate of 9.6 percent (338 invitations could not be delivered). This rate is similar to those obtained by other recent surveys among SourceForge.net developers (e.g. Wu et al., 2007; Sen et al., 2008). Eleven responses had to be eliminated due to inconsistent or corrupt entries, leaving us with 675 completed surveys.\n\nThe demographic profile of the developers participating in our study (see Table 1) is largely consistent with that reported by other studies among OSS developers (e.g. Lakhani and Wolf, 2005; Sen et al., 2008). In particular, we find no indication that nonresponse has biased our sample to overrepresent less serious OSS developers. Of special relevance to our endeavor is the fact that only 92 percent (or 624) of the developers we surveyed actually write code for their OSS projects. As only developers writing code can practice code reuse, our further analyses will focus on these 624 developers.\n\nBefore starting the analysis of our data, we briefly assess the multi-item constructs we have employed to measure developers' motivation to work on their main project. The items for these constructs were adopted from prior research both in the OSS domain (Hars and Ou, 2002; Lakhani and von Hippel, 2003; Roberts et al., 2006) and in psychological motivation research (Amabile et al., 1994; Clary et al., 1998), and were measured on seven-point Likert scales (\u201cstrongly disagree\u201d to \u201cstrongly agree\u201d). We took several steps to ensure validity and reliability of these measures. Content validity was qualitatively assessed through building on existing OSS literature whenever possible, discussions with fellow OSS researchers, and two rounds of pretests. Reliability was assessed via Cronbach\u2019s $\\alpha$ for each multi-item variable. Not all Cronbach\u2019s $\\alpha$ values exceed Straub\u2019s (1989) rule of thumb of 0.8, but they all exceed Nunnally\u2019s (1978) threshold of 0.6 (see Table A1 in the Appendix). Convergent validity was assessed through factor analysis, which confirms that all items have their highest loading with their respective intended\n\n---\n\n9 Given the large number of surveys among SourceForge.net developers, one might suspect that especially the more active developers on this platform would show signs of \u201csurvey fatigue.\u201d However, comparing the self-reported weekly hours developers spend working on their main project between our survey (mean: 8.8) and the first SourceForge.net survey ever taken by Lakhani and Wolf (2005) (mean: 7.5), mitigates these concerns. The additional finding, that 69 percent of the developers in our survey have worked as professional software developers or are still working as professional software developers with an average tenure of 7.9 years rules out the further concern that only less skilled programmers took part in our survey.\nconstruct and all loadings are higher than 0.5 (Hair et al., 2006) (see Table A1 in the Appendix). Discriminant validity is demonstrated by showing that the square root of the average variance extracted of each construct is greater than its correlations with other constructs (see Table A2 in the Appendix), thus satisfying the Fornell-Larcker criterion (Fornell and Larcker, 1981).\n\n| Table 1: Demographics of Survey Participants | Percentage |\n|---------------------------------------------|------------|\n| **Age (mean: 31.8, median: 30)** | |\n| 1-19 | 5% |\n| 20-29 | 42% |\n| 30-39 | 35% |\n| 40-49 | 13% |\n| 50+ | 5% |\n| **Residence** | |\n| North America | 26% |\n| South America | 5% |\n| Europe | 54% |\n| Asia and rest of world (RoW) | 15% |\n| **Highest education level** | |\n| Non-university education | 15% |\n| Undergraduate or equivalent | 35% |\n| Graduate or equivalent | 30% |\n| Ph.D. and higher | 20% |\n| **Task profile in open source projects** | |\n| Includes writing code | 93% |\n| Does not include writing code | 7% |\n| **Hours spent working on main OSS project per week (mean: 8.8, median: 5)** | |\n| 1-4 | 48% |\n| 5-9 | 19% |\n| 10-19 | 21% |\n| 20+ | 12% |\n| **Size of personal OSS network (mean: 29.9, median: 8)** | |\n| 1-9 | 70% |\n| 10-19 | 18% |\n| 20+ | 12% |\n| **Number of OSS projects ever involved in (mean: 3.7, median: 2)** | |\n| 1-4 | 65% |\n| 5-9 | 26% |\n| 10-14 | 6% |\n| 15+ | 3% |\n\nIn order to reduce common method bias, we employed several measures during data collection as suggested by Podsakoff et al. (2003). We have taken care to formulate simple and unambiguous questions for our survey by discussing our questionnaire items with our interview partners and conducting multiple rounds of pretests. Further, survey respondents were assured when the survey was\nintroduced to them that their responses would be treated strictly confidentially. Moreover, much of the survey items address motivations, attitudes, and beliefs for which by nature there are no right or wrong answers.\n\nTo estimate the presence of common method bias in our data after survey completion, we employed Harman\u2019s test in which all variables of a model are loaded onto a single factor in a principal component factor analysis. A significant amount of common method bias is assumed to exist if this one factor explains a large portion of all the variance in the data (Podsakoff et al., 2003). In our data we find the maximum variance explained by one factor being 9.3 percent, which does not hint toward strong common method bias.\n\n5. Results and Discussion\n\nFollowing the research questions presented above, this section consists of four parts. In the first, we establish the importance of code reuse in OSS development. Next, we present perceived benefits and issues of reuse as well as impediments to it, and address the question of why OSS developers do or do not reuse code. The third part presents the core of this study in the form of a multivariate analysis of code reuse behavior used to test our research model. In the final, fourth part we discuss potential threats to validity and limitations of our study.\n\n5.1. Importance of Code Reuse\n\nWhen measuring code reuse we focused on component and snippet reuse. In our survey, component reuse was defined as \u201creusing of functionality from external components in the form of libraries or included files. E.g., implementing cryptographic functionality from OpenSSL or functionality to parse INI files from an external class you have included. Please do not count functionalities from libraries that are part of your development language, such as the C libraries.\u201d In a similar fashion, snippet reuse was defined as \u201creusing of snippets (several existing lines of code) copied and pasted from external sources. If you have modified the code after copying and pasting it by, e.g., renaming variables or adjusting it to a specific library you use, this would still be considered as [\u2026] reuse [\u2026]\u201d.\n\nThree different measures (depicted in Table 2) were employed to investigate the importance of code reuse. First, related to, for example, Cusumano\nand Kemerer (1990) or Frakes and Fox (1995), we asked developers to indicate the share of functionality based on reused code that was added by them to their current main project. We found that, on average, nearly one third (mean=30%, median=20%) of the functionality OSS developers have added to their project was based on reused code, which points out that code reuse is indeed an important element of OSS development. This interpretation is further supported by the fact that only six percent of the developers surveyed report that they have not reused any code at all. Furthermore, the maximum share of reused functionality of 99 percent shows that some developers rely very heavily on code reuse and see their role mainly in writing \u201cglue-code\u201d to integrate the various pieces of reused code. As a second measure, we employed a self-developed four-item scale to directly measure the perceived importance of reuse for the individual developers\u2019 work on their main project.\\(^\\text{10}\\) On seven-point Likert scales, developers indicated their agreement to four statements that described, in various ways, reuse as \u201cvery important.\u201d With a mean of 4.74 (median=5.25) and 58 percent of all developers at least \u201csomewhat agreeing\u201d to the statements, the important role of code reuse in OSS development is again confirmed.\n\nFinally, as the third approach, using a further self-developed four-item scale,\\(^\\text{11}\\) we asked developers to indicate their intent to reuse existing code in the future development of their current main project. The results are largely similar to those obtained by using the second measure (perceived importance of reuse in past work), once more indicating that code reuse is very important. However, both mean and median are significantly lower (mean=4.57, median=4.75) than in the previous\n\n\\(^{10}\\) The scale was developed based on our interviews with developers and on research on general knowledge reuse (Watson and Hewett, 2006). It also draws on the intention and behavior scales commonly employed in TAM or TPB research in the IS domain, for example, by Riemenschneider et al. (2002) or by Mellarkod et al. (2007). The statements of the scale are: \u201cReusing has been extremely important for my past work on my current main project,\u201d \u201cWithout reusing, my current main project would not be what it is today,\u201d \u201cI did reuse very much during my past work on my current main project,\u201d and \u201cMy past work on my current main project would not have been possible without reusing.\u201d The scale explains 83.4 percent of the total variance and Cronbach\u2019s $\\alpha$ is 0.93.\n\n\\(^{11}\\) The statements of the scale are: \u201cReusing will be extremely important in my future work on my current main project,\u201d \u201cRealizing my future tasks and goals for my current main project will not be possible without reusing,\u201d \u201cI will reuse very much when developing my current main project in the future,\u201d and \u201cRealizing my future tasks and goals for my current main project will be very difficult without reusing.\u201d The scale explains 83.8 percent of the total variance and Cronbach\u2019s $\\alpha$ is 0.94.\nmeasure. This finding might be a first indication supporting hypothesis $H3$, which states that code reuse is more important in earlier phases of an OSS project.\n\n| Measure | Mean | Median | S.D. | Min. | Max. |\n|------------------------------------------------------------------------|-------|--------|-------|-------|-------|\n| Share of implemented functionality based on reused code (in %) | 30.0% | 20.0% | 26.5% | 0.0% | 99.0% |\n| Importance of reuse for past work on project (seven-point Likert scale)* | 4.74 | 5.25 | 1.86 | 1.00 | 7.00 |\n| Importance of reuse for future work on project (seven-point Likert scale)* | 4.57 | 4.75 | 1.69 | 1.00 | 7.00 |\n\n*Measure is based on four single items. N=624.\n\nDespite the prominent role of code reuse as consistently indicated by all three measures, the high standard deviations also reveal large heterogeneity in developers\u2019 code reuse behavior. Developers\u2019 individual reasons for and against code reuse in their development are suspected to largely drive this heterogeneity and will be explored in the following section.\n\n5.2. Developers\u2019 Reasons For and Against Code Reuse\n\nIn our analysis of developer\u2019s reasons for and against code reuse, we differentiate between three sets of factors. First, we analyze the benefits of code reuse as perceived by OSS developers. Second, we investigate the drawbacks and issues that developers see in code reuse, and, finally, we address the importance of general impediments\\(^\\text{12}\\) to code reuse.\n\nBased on our interviews, as well as the existing literature, we have identified eight distinct benefits of code reuse. Survey participants were asked to indicate their agreement on a seven-point Likert scale to statements regarding these benefits. Results are displayed in Figure 2 and show that all of the statements received rather high shares of agreement. The two statements with the highest level of agreement both point to efficiency effects of reuse, followed by a statement pertaining to its effectiveness effects. For the benefits on ranks four and higher, agreement drops significantly compared to rank three, yet is still quite high. Ranked fourth and fifth are statements addressing effects of reuse on the quality of the\n\n\\(^{12}\\) While these \u201cgeneral impediments\u201d are rather objective compared to developers\u2019 beliefs about benefits and issues, they may still reflect individual developer\u2019s opinions, having been measured by asking the developers.\nsoftware being developed by making it more stable and more compatible with standards. The statement ranked eighth, about the effects of code reuse on software security also pertains to this group, however, it receives considerably less agreement. This could be explained by the fact that many OSS projects develop types of software for which security is not a major concern, for example, games. Ranked sixth and seventh are statements that position reuse as a means for developers to select their project tasks by preference and avoid mundane jobs. An example for this is \u201coutsourcing\u201d maintenance work to the original developers of the reused code who fix bugs or implement new functionality in the code of which the reusing developer benefits without having to do this work by herself.\n\n| Reuse benefits as perceived by developers (in % of developers) | Share agreement | Share disagreement |\n|---------------------------------------------------------------|-----------------|--------------------|\n| 1. Reusing helps developers realize their project goals/ tasks faster | 92% | 3% |\n| 2. Reusing allows developers to spend their time on the most important tasks of the project | 91% | 9% |\n| 3. Reusing allows developers to solve difficult problems for which they lack the expertise | 85% | 14% |\n| 4. Reusing helps developers create more reliable/ stable software, e.g. less bugs | 74% | 12% |\n| 5. Reusing ensures compatibility with standards, e.g. the look and feel of GUIs | 72% | 14% |\n| 6. Reusing allows developers to spend their time on the development activities they have most fun doing | 67% | 24% |\n| 7. Reusing allows developers to \u201coutsource\u201d maintenance tasks for certain parts of their code to developers outside of their project | 60% | 19% |\n| 8. Reusing helps developers create more secure software, e.g. less vulnerabilities | 57% | 19% |\n\nNote: The share of developers who are \u201cindifferent\u201d about the statements is not shown. N=624.\n\nFigure 2: Share of Developers that Disagree/Agree to Reuse Benefits\n\nIn order to check consistency of responses and to construct factor scores to be used in the multivariate analyses later, an exploratory factor analysis is carried out. With four components, it explains 77.2 percent of total variance and yields good quality measures (KMO: 0.76, p<0.0001). The resulting components can be interpreted as development efficiency (ranks 1, 2), software quality (ranks 4, 5, 8), task selection (ranks 6, 7), and development effectiveness (rank 3).\n\n13 For better interpretability of the resulting components, components with an Eigenvalue of less than 1 were also extracted. The fourth component had an Eigenvalue of 0.79.\n\n14 The factor analysis uses principal component analysis and Varimax rotation. Cronbach\u2019s $\\alpha$ for the components software quality, development efficiency, and task selection is 0.80, 0.72 and 0.47, respectively. See Table A3 in the Appendix for detailed factor loadings.\nFollowing the benefits of code reuse, nine issues and drawbacks identified in our interviews and existing literature (shown in Figure 3) were presented to participants who were again asked to indicate their agreement to the respective statements. The highest share of agreement was received by a statement pointing to the loss of control that a developer may have to accept when reusing code. Statements ranked second and third also relate to losing control, however, with significantly lower levels of agreement. The statement ranked second points to software being more difficult to install (build) and use by end-users due to technical dependencies, while the statement ranked third reflects the developer\u2019s obligation to check and integrate updates of reused code.\\textsuperscript{15} Ranked fourth, fifth, and eighth\u2014and again with significantly lower levels of agreement than the previous statements\u2014are two potential issues of code reuse that point to quality and security risks. The statements ranked sixth, seventh, and ninth all describe situations where development from scratch is more efficient than code reuse. They do, however, receive at least 50 percent disagreement, which emphasizes that most developers do not deem searching, understanding, and adapting reusable code as inefficient.\n\n\\begin{figure}[h]\n\\centering\n\\includegraphics[width=\\textwidth]{figure3.png}\n\\caption{Share of Developers that Disagree/Agree to Reuse Issues and Drawbacks}\n\\end{figure}\n\n\\textsuperscript{15} Both statements mainly refer to component reuse and are only partially applicable to snippet reuse.\nAn exploratory factor analysis of these issues and drawbacks explains 69.0 percent of total variance with three components, and yields good quality measures KMO: 0.72, p<0.0001). The resulting components can be interpreted as control loss (ranks 1, 2, 3), quality risks (ranks 4, 5, 8), and inefficiency of reuse (ranks 6, 7, 9).\\textsuperscript{16}\n\nTo consolidate the number of variables in the multivariate model employed later, a further factor analysis merged the software quality benefits and the quality risks into one component. Further, the development efficiency benefits were merged with the inefficiency of reuse. The five final components used in the multivariate model are: effectiveness benefits, efficiency benefits, quality benefits, task selection benefits, and loss of control risks.\n\nWhile the benefits and issues/drawbacks of code reuse were subjective and perceived by the individual developer, there also exist general impediments to reuse. These general impediments which resulted from our interviews and existing literature make code reuse difficult or impossible even if the individual developer wanted to rely on existing code (see Figure 4). Interestingly, however, all four statements offered to the surveyed developers received more disagreement than agreement. The statement \u201cthere exist only very few reusable resources for my current main project\u201d ranked first, with 39 percent of the developers agreeing. Oneway ANOVA analysis used to identify for which projects there exist least reusable resources found only the target operating system of a project having a significant influence on the availability of reusable code (p=0.0497). Projects that are not developed for POSIX operating system systems (e.g., Linux) or Windows have less reusable code at their disposal. Neither the type of the project (e.g., \u201cSoftware Development,\u201d \u201cScientific and Engineering,\u201d or \u201cGames and Entertainment\u201d) had any significant influence (p=0.2440), nor did the graphical user interface employed by the project (0.1171).\n\nRanked as the second general impediment to code reuse with 24 percent agreement are license incompatibilities. Such a situation would occur, for example, if a programmer wanted to reuse code snippets licensed under the GPL in a project licensed under the BSD license. As expected, the license of the developer\u2019s main project significantly influences this general impediment (Oneway ANOVA, \\textsuperscript{16} The factor analysis uses principal component analysis and Varimax rotation. Cronbach\u2019s $\\alpha$ for the components control loss, quality risks, and inefficiency of reuse is 0.66, 0.76 and 0.85, respectively. See Table A4 in the Appendix for detailed factor loadings.)\np<0.0001), with developers working on GPL licensed projects least likely to perceive this as an issue. However, the low share of agreement is surprising. Three possible explanations for this finding seem plausible: First, there might exist enough reusable code in each license category. Second, developers might able to mitigate the license incompatibilities through modular project architectures that clearly separate modules under different licenses and thus avoid contamination issues (Henkel and Baldwin, 2009). Third, developers are not knowledgeable about license incompatibilities and ignore the potential issues. Ranked third and fourth with 17 percent and nine percent agreement, respectively, are the architecture of the developer\u2019s current main project being not modular enough to allow for easy integration of reusable code (rank 3) and incompatibilities between the project\u2019s main programming language and the programming language of the code the developer wants to reuse (rank 4). Both are significantly dependent on the programming language of the developer\u2019s project (Oneway ANOVA, p=0.0036 and p<0.0001 for rank 3 and rank 4, respectively), with C++ and Java as object-oriented languages posing the least issues.\n\n| General impediments to reuse as perceived by developers (in % of developers) |\n|-----------------------------------------------------------------------------|\n| 1. There exist only very few reusable resources for my current main project |\n| 2. License issues make reusing in my current main project very difficult, e.g. reusing a GPL component would require the license of my current main project to be changed to GPL as well |\n| 3. The software architecture of my current main project makes reusing very difficult, e.g. the architecture of my current main projects is not very modular |\n| 4. The programming language of my current main projects makes reusing very difficult, e.g. the programming language of my current main projects makes including popular libraries difficult |\n\nNote: The share of developers who are \u201cindifferent\u201d about the statements is not shown. N=624.\n\nFigure 4: Share of Developers that Disagree/Agree to General Reuse Impediments\n\n5.3. Multivariate Analysis of Reuse Behavior\n\nFollowing the descriptive analysis, the objective of our research model is to explain the observed heterogeneity in developers\u2019 reuse behavior found earlier with both developer and project characteristics. We test the research model with our three different measures of reuse behavior as dependent variables in three different\nregression models in order to ensure robustness of results.\\textsuperscript{17} All three models will be tested using Tobit regressions as their dependent variables are restricted to either [0-100\\%] or [1-7].\\textsuperscript{18} A summary of the research model hypotheses and the support they received in the multivariate analyses is presented in Table 3 while the detailed regression tables containing the Tobit models are depicted in Table 4. As a further robustness check, we ran specifications of the three models with successive elimination of insignificant variables. The results of this robustness check which are largely consistent with the results of the main models are shown in Table A7 in the Appendix. The results of the multivariate analyses are presented and discussed in the following.\n\n| Hypotheses | Confirmed? |\n|------------|------------|\n| **Attitude toward reuse:** Developers reuse more on existing code\u2026 | |\n| H1a: \u2026the more strongly they perceive the effectiveness benefits of reuse. | \u2713 |\n| H1b: \u2026the more strongly they perceive the efficiency benefits of reuse. | \u2713 |\n| H1c: \u2026the more strongly they perceive the quality benefits of reuse. | \u2713 |\n| H1d: \u2026the more strongly they perceive the task selection benefits of reuse. | \u2713 |\n| H1e: \u2026the less strongly they perceive the loss of control risks of code reuse. | \u2717 |\n| **Access to local search:** Developers reuse more existing code\u2026 | |\n| H2a: \u2026the larger their personal OSS network. | \u2713 |\n| H2b: \u2026the greater the number of OSS projects they have been involved in. | \u2713 |\n| **Project maturity:** | |\n| H3: Developers reuse more existing code the less mature their project. | \u2713 |\n| **Compatibility with project goals:** Developers reuse more existing code\u2026 | |\n| H4a: \u2026the less important challenge seeking is for them as a motivation to work on their project. | \u2717 |\n| H4b: \u2026the less important coding fun and enjoyment is for them as a motivation to work on their project. | \u2717 |\n| H4c: \u2026the more important skill improvement is for them as a motivation to work on their project. | \u2713 |\n| H4d: \u2026the more important community commitment is for them as a motivation to work on their project. | \u2713 |\n| H4e: \u2026the more important OSS reputation building is for them as a motivation to work on their project. | \u2717 |\n| H4f: \u2026the more important commercial signaling is for them as a motivation to work on their project. | \u2717 |\n\nLegend: \u2713: fully confirmed; \u2713: partially confirmed; \u2717: not supported\n\n\\textsuperscript{17} Descriptive statistics of all explanatory variables are depicted in Table A5 in the Appendix. The correlation matrix is shown in Table A6 in the Appendix.\n\n\\textsuperscript{18} In contrast to an OLS regression, a Tobit model accounts for the censoring of the dependent variable. In the present case this means, for example, that the share of functionality from reused resources cannot be less than zero percent, or larger than 100 percent.\nTable 4: Multivariate Analysis of Developers\u2019 Reuse Behavior\n\n| | Past importance of reuse | (1) Likert scale | (2) Percentage scale | (3) Future importance of reuse (Likert scale) |\n|--------------------------------|--------------------------|------------------|----------------------|-----------------------------------------------|\n| **Attitude toward reuse** | | | | |\n| BenefitEffectiveness (H1a) | 0.222*** (0.076) | 2.701*** (1.021) | 0.168*** (0.063) | |\n| BenefitEfficiency (H1b) | 0.653*** (0.084) | 5.959*** (1.114) | 0.517*** (0.069) | |\n| BenefitQuality (H1c) | 0.303*** (0.081) | 1.800* (1.073) | 0.250*** (0.067) | |\n| BenefitTaskSelection (H1d) | 0.155** (0.078) | 3.528*** (1.041) | 0.132** (0.064) | |\n| IssueControlLoss (H1e) | -0.030 (0.077) | -0.506 (1.036) | -0.004 (0.064) | |\n| **Access to local search** | | | | |\n| DevOSSNetsize (log) (H2a) | 0.165** (0.083) | 2.098* (1.102) | 0.230*** (0.069) | |\n| DevOtherProjects (H2b) | 0.022 (0.016) | 0.398* (0.208) | 0.032** (0.013) | |\n| **Project maturity** | | | | |\n| ProjPhase (H3) | -0.149** (0.070) | -3.227*** (0.928) | -0.219*** (0.057) | |\n| **Compatibility with project goals** | | | | |\n| MotChallenge (H4a) | -0.148* (0.083) | -2.559** (1.103) | -0.067 (0.068) | |\n| MotFun (H4b) | 0.098 (0.080) | 0.575 (1.072) | 0.055 (0.066) | |\n| MotLearning (H4c) | 0.003 (0.080) | -1.438 (1.053) | -0.015 (0.066) | |\n| MotCommunity (H4d) | 0.177** (0.086) | 1.964* (1.150) | 0.148** (0.071) | |\n| MotOSSReputation (H4e) | 0.005 (0.057) | 0.128 (0.758) | 0.065 (0.047) | |\n| MotSignaling (H4f) | -0.054 (0.061) | 0.336 (0.817) | 0.013 (0.051) | |\n| **Subjective norms** | | | | |\n| DevNorm | 0.140** (0.066) | 2.372*** (0.887) | 0.197*** (0.055) | |\n| **Perceived behavioral control** | | | | |\n| ProjPolSupport | 0.440** (0.200) | 0.946 (2.670) | 0.297* (0.165) | |\n| ProjPolDiscourage | -1.087** (0.457) | -4.977 (6.161) | -1.279*** (0.383) | |\n| ConditionLack | -0.250*** (0.044) | -2.317*** (0.589) | -0.168*** (0.036) | |\n| ConditionLicense | 0.065 (0.045) | 0.309 (0.599) | 0.018 (0.037) | |\n| ConditionLanguage | 0.030 (0.060) | -0.071 (0.802) | 0.060 (0.049) | |\n| ConditionArchitecture | 0.017 (0.052) | 0.481 (0.698) | 0.017 (0.043) | |\n| DevSkill | -0.075 (0.095) | -0.123 (1.270) | -0.018 (0.078) | |\n| **Further control variables** | | | | |\n| ProjSize | 0.000 (0.002) | -0.021 (0.024) | -0.002 (0.001) | |\n| ProjComplexity | 0.131 (0.092) | 2.194* (1.236) | 0.0190 (0.076) | |\n| ProjStack | 0.210** (0.091) | 1.499 (1.209) | 0.135* (0.074) | |\n| ProjStandalone | 0.118 (0.197) | 0.233 (2.633) | 0.203 (0.163) | |\n| DevOSSExperience | 0.010 (0.018) | 0.076 (0.249) | 0.000 (0.015) | |\n| DevProjTime | 0.014* (0.008) | -0.039 (0.107) | 0.008 (0.007) | |\n| DevProjShare | 0.003 (0.002) | 0.031 (0.033) | 0.001 (0.002) | |\n| DevProf | 0.056 (0.186) | 0.214 (2.492) | 0.184 (0.154) | |\n| DevEduReuse | -0.127 (0.165) | -1.177 (2.201) | -0.266* (0.136) | |\n| DevProfEduReuse | 0.603** (0.237) | 5.883* (3.094) | 0.378* (0.193) | |\n| Residence-N. America | -0.159 (0.181) | -3.310 (2.408) | 0.120 (0.149) | |\n| Residence-S. America | 0.236 (0.359) | -3.424 (4.743) | -0.013 (0.294) | |\n| Residence-Asia & RoW | -0.102 (0.226) | 0.764 (3.031) | -0.109 (0.187) | |\n| Constant | 3.026*** (0.888) | 23.275* (11.87) | 2.545*** (0.731) | |\n| Observations | 624 | 624 | 624 | |\n| Pseudo R\u00b2 | 0.107 | 0.029 | 0.119 | |\n| Likelihood ratio | \u03a7\u00b2(35)=267.42, p<0.0001 | \u03a7\u00b2(35)=162.74, p<0.0001 | \u03a7\u00b2(35)=289.55, p<0.0001 | |\n| \u03c3 | 1.790 | 24.337 | 1.493 | |\n\nNotes: All models are Tobit models; standard errors in parentheses; * significant at 10%; ** significant at 5%; *** significant at 1%.\n\nElectronic copy available at: https://ssrn.com/abstract=1489789\n5.3.1. Attitude Toward Reuse\n\nThe regression results confirm hypotheses $H1a$ to $H1d$. Developers who perceive higher effectiveness, efficiency, quality, or task selection benefits from code reuse attribute a higher importance to it and practice it more. The coefficients for all four hypotheses are positive and significant for all dependent variables and all specifications. In contrast, hypothesis $H1e$ is not confirmed. The data does not show that developers who fear to lose control over their project reuse less code. This is surprising as, in our descriptive analysis, loss of control was ranked as the main issue developers have with code reuse. A plausible interpretation is that developers\u2019 concerns about losing control over their project affect their decision as to which code to reuse, but do not affect the total amount of code they reuse. For example, developers concerned about losing control might choose to reuse only components developed by other projects that have a proven track record of fixing bugs quickly and keeping the structure of their code stable (Haefliger et al., 2008).\n\n5.3.2. Access to Local Search\n\nThe effect of developers\u2019 access to local search on their reuse behavior was captured by the logarithm of the size of their OSS network ($H2a$) and the number of other OSS projects they have been involved in ($H2b$). Hypothesis $H2a$ is confirmed in all models while $H2b$ is confirmed only partially, its coefficient not being significant in model 1. Nonetheless, all coefficients are positive in all models, supporting our assumption that developers that can access, evaluate, understand, and integrate reusable code more easily due to local search practice more code reuse.\n\nThe finding that the number of years a developer has been involved in OSS does not exhibit a significant effect on her reuse behavior (see control variable DevOSSExperience) is consistent with our argumentation regarding local search. We had claimed that developers who can turn to their personal OSS network or their experience in other OSS projects reuse more because of their better access to local search. A greater number of years involved in OSS alone does not yet facilitate such better access because for example a developer with ten years of OSS work spent in only one project does not have access to local search regarding which code other projects use to solve a particular problem.\n5.3.3. Project Maturity\n\nOur hypothesis that developers reuse less code once their project has matured (H3) is confirmed across all dependent variables and specifications.\\(^{19}\\) Developers do indeed seem to leverage reuse as a tool to deliver a \u201ccredible promise\u201d early on and overcome liabilities of newness to get on a par with competing existing projects, while later project phases call for specific refinements of their projects where there is less available code to reuse.\n\n5.3.4. Compatibility with Project Goals\n\nRegarding the compatibility of code reuse with a developer\u2019s individual project goals, hypothesis \\(H4d\\) (community commitment) is confirmed in all models except model 2; \\(H4a\\) (challenge seeking) is confirmed only in models with past reuse as the dependent variable (models 1, 2 and 5). For all other hypotheses (coding fun and enjoyment \\((H4b)\\), skill improvement \\((H4c)\\), OSS reputation building \\((H4e)\\), and commercial signaling \\((H4f)\\)) the null hypothesis cannot be rejected.\n\nThe support for hypothesis \\(H4d\\) highlights that developers who feel they are part of the OSS community and want it to grow and be successful rely more on code reuse than other developers. Code reuse is compatible with their goal of contributing to the OSS community because by leveraging code reuse they can contribute more and in higher quality.\\(^{20}\\) The partial confirmation of hypothesis \\(H4a\\) supports our assumption that the developers\u2019 goal to seek and tackle technical challenges impedes code reuse. By reusing existing code, developers would not be denied the pleasure of solving a problem by themselves. Thus, they would rather refrain from code reuse if challenge seeking is of major importance to them in their OSS work. The finding that the respective coefficient is not significant when the dependent variable is the developers\u2019 future intent to reuse may be due to the desire\n\n\\(^{19}\\) Note that in models 1, 2, 4 and 5 where past reuse behavior is the dependent variable, the amount of reused code reported by developers with projects in later development phases is their average reuse level including the assumed high levels of code reuse of early phases and the proposed lower levels of later phases. However, if reuse goes down with maturity as proposed then also average reuse decreases over the lifetime of a project.\n\n\\(^{20}\\) Moreover, developers who are more sympathetic toward the OSS community might also be affected by the general positive attitude toward reuse of this community (e.g. Raymond, 2001). This effect is, however, captured via subjective norms as control variable.\ndevelopers may have to solve a problem by themselves, without external help, is something that can occur spontaneously and is thus difficult to predict.\n\nWe now turn to those hypotheses that are not supported. We had argued that similarly to challenge seeking, the fun and enjoyment developers experience when writing code leads them to reuse less code \\((H4b)\\), but we cannot confirm this hypothesis. In fact, the respective coefficients are not negative as expected, but positive, though insignificant. The remaining unconfirmed hypotheses, skill improvement \\((H4c)\\), OSS reputation building \\((H4e)\\) and commercial signaling \\((H4f)\\) partially show varying coefficient signs. This could be because, contrary to our assumptions, code reuse could be both supportive as well as detrimental to these goals. While reused code could be used as an example to improve programming skills, it could also hamper learning if developers treat the reused code as a black box. Regarding reputation building and commercial signaling, we had expected that developers who make their projects more successful with the help of code reuse are regarded more highly in the OSS community and can present themselves as better developers to potential employers or business partners. However, it is also possible that in certain situations the code created by developers themselves without the help of code reuse is important to build their OSS reputation or signal skills to potential employers and partners. In these situations, developers would refrain from code reuse if reputation building or signaling is a main motivation for their OSS work.\n\n5.3.5. Control Variables\n\nDue to the large number of control variables included in our model, we only point out a few main results. The social norms as perceived by developers show a consistently significant and positive influence as predicted by TPB. Consequently, OSS developers who feel that their peers appreciate them reusing existing code will reuse more. Of the variables describing developers\u2019 perceived behavioral control, the lack of reusable code has a consistently negative and significant influence on reuse behavior. With the exception of one dependent variable, project policies discouraging reuse lead to reduced code reuse, while policies promoting reuse are found to significantly increase reuse behavior in three models (1, 4, 6). Lastly, developers who had received training on reuse in companies, practice significantly more code reuse, while developers who had only learned about reuse during their\nacademic education do not differ in their code reuse behavior from developers who had not had reuse in their curriculum.\n\nTo summarize, the regression analyses shed light on developers\u2019 code reuse behavior. In particular, the (partially) confirmed hypotheses $H2$ (access to local search), $H3$ (project maturity), and $H4a$ (challenge seeking) provide interesting findings that are also relevant beyond the scope of OSS.\n\n5.4. Possible threats to validity and limitations of the study\n\nIn the following we employ the four generally accepted criteria of validity (Cook and Campbell, 1979) as our structure: Construct validity, internal validity, statistical conclusion validity and external validity.\n\nConstruct validity threats concern the ability to measure what we are interested in measuring. As pointed out in sections 4 and 5, the measures employed in this study are based on existing measures from other studies and our interviews. All measures were assessed for clarity by other researchers and OSS developers during pretests as described above. Furthermore, all multi-item constructs were quantitatively gauged with regards to reliability, convergent validity, and discriminant validity. We thus consider our study to possess sufficient construct validity. Nonetheless, a potential issue is whether developers are able to accurately estimate their level of code reuse in a questionnaire. However, while an additional verification of our results using an objective measure of code reuse is certainly worthwhile, developers in our pretests convinced us that they can, with considerable precision, estimate their degree of code reuse. Furthermore, to ensure robustness of our findings we have employed three different measures of code reuse in the survey. Finally, also many other reuse studies rely on reported reuse levels (e.g. Frakes and Fox, 1995; Lee and Litecky, 1997).\n\nInternal validity, maintaining that there should not exist alternative explanations for the relationships identified between our research model constructs, should also be given since our research model relies on the well established TPB and because we have included multiple further control variables derived from our interviews and OSS or reuse literature. A potential issue is our approach to deal with component and snippet reuse simultaneously. If component reuse in OSS development equaled black-box reuse there might exist different drivers for it than\nfor snippet reuse. However, because we find that about 50% of the surveyed developers modify the components they reuse we argue that at least in the OSS context component reuse does not constitute typical black-box reuse. Consequently, we expect both component and snippet reuse to be influenced by largely the same drivers.\n\nIn addition to that, we also consider our results to be valid with regard to our statistical conclusions since they are based on a sample of considerable size and backed by the significance levels of our hypotheses as well as the largely consistent results in various model specifications and with various dependent variables.\n\nFinally, external validity threats concern the generalization of our findings. In line with the other main studies of individual OSS developers we drew our sample from SourceForge.net developers. As pointed out in chapter 4, we have no reason to believe that our sample is not representative of SourceForge.net developers. Thus, generalization for this most frequently researched group of OSS developers should be feasible. To ensure external validity when generalizing to OSS developers registered on other platforms (where e.g. projects are larger) or to traditional software developers working on proprietary software in commercial firms it would be necessary to replicate our study in these settings. However, both our data as well as our research model suggest that generalization to other contexts should yield similar results. For example on the data side we do not find significant differences between the reuse behavior of paid and hobbyist OSS developers. Regarding the research model, it would be surprising to find that rather general hypotheses such as the effect of network size or challenge seeking work differently in the context of proprietary software development.\n\n6. Conclusion\n\nIn this paper, we set out to use quantitative data obtained through a survey to explain and understand code reuse in OSS projects. Contributing to the emerging stream of scholarly work on code reuse in OSS, we present strong evidence that code reuse is of major importance in OSS development and has contributed to its success. We further show that OSS developers perceive efficiency and effectiveness as the main benefits of code reuse. Of relevance not only to OSS research but also to the domains of software engineering and the receiving side of\nopen innovation processes in general, our investigation of drivers of code reuse finds that developers with better access to local search due to a larger personal OSS network or more exposure to different OSS projects reuse more existing code, presumably because their costs of accessing this code are lower. Further, developers convinced of the benefits of code reuse (efficiency and effectiveness gains, enhanced software quality, and the chance to work on preferred tasks) practice it more, as do developers who can use code reuse to support their goal of serving the OSS community. Moreover, developers see code reuse as a means to kick-start new projects as it helps them deliver a \u201ccredible promise\u201d and close the gap to existing and competing projects more quickly. Lastly, we find partial support for our hypothesis that those developers who desire to solve technical problems for the satisfaction of it refrain from reuse and, thus, make their projects less efficient and effective than they could be.\n\nAs academic work on code reuse in OSS has only just begun, it merits further research. While our study has addressed development with reuse, future work should investigate development for reuse, that is OSS projects which develop components primarily intended to be reused in other projects. Questions of relevance in this context are: why do developers bear the reportedly large additional costs of writing reusable code,\\(^{21}\\) or have they have found ways to mitigate them. Additionally, as has been pointed out by Haefliger et al. (2008), the strategies that OSS developers employ to make their reusable code known and reused deserve investigation. Moreover, the limitations of our work open up several further research avenues. First, our dependent variables reflect developers\u2019 subjective perception of the importance of code reuse for their OSS work. In an alternative way, and potentially adding robustness to our findings, the importance of reuse could be captured more objectively by analyzing the code of a project. Similarly, independent variables captured from other data sources could be added to our model. For example, social network data derived from SourceForge.net (e.g. Fershtman and Gandal, 2009) could be employed to further extend and test our hypotheses on local search. Moreover, we have described code reuse in general, not differentiating between its various forms (components, snippets, algorithms). A more fine-grained analysis using these dimensions might yield further insights into the mechanics of\n\n\\(^{21}\\) For example Tracz (1995) estimates that writing reusable code leads to 100 percent of additional effort.\ncode reuse in OSS projects. Finally, while we have focused on developers and their projects as determinants of code reuse, future work could employ an even more detailed approach and analyze single reuse incidents, incorporating developers, their projects, and the artifacts they consider for reuse. Such an approach could, for instance, analyze the impact of the quality of the relationship between the \u201cgiving\u201d and the \u201creceiving\u201d side of the open innovation process on code reuse.\n\nBeyond their scholarly implications, our findings are also of relevance to managerial practice. They highlight the high level of reuse within the OSS community that should provide motivation to firms to also leverage existing OSS code in their software development, thereby partly mitigating the typically high upfront investment costs of building an internal reuse library for artifacts that are not firm-specific (Frakes and Kang, 2005). Further, if they intend to pursue this avenue of reusing OSS code, commercial firms should encourage and support their employees to enhance their access to local search for OSS code by building personal OSS networks and by becoming involved in various OSS projects. Beyond reuse of OSS code, modified incentives and development processes based on our findings could support internal corporate reuse activities in software engineering and beyond. As part of such modifications, developers could be provided with the option to select tasks themselves, according to their preference, they could be compensated according to their work results delivered and not based on the time they have spent at work, and they could be required to deliver \u201ccredible promises\u201d in new development projects (Haefliger et al., 2008). Lastly, to accommodate the desire of developers to tackle difficult technical challenges, which makes them reuse less than they could, firms could consider job enrichment (e.g. Herzberg, 1968) to integrate challenges into developers\u2019 work that are in the best interest of the firm, thereby accommodating the needs of both developer and firm.\n\n22 Obviously this has to be in accordance with the licenses of the OSS code. However, well-designed product architectures can mitigate many of the issues potentially arising here (Henkel and Baldwin, 2009).\n7. References\n\nAjzen, I. (1991) \"The Theory of Planned Behavior,\" *Organizational Behavior and Human Decision Processes* 50 (2), pp. 179-211.\n\nAjzen, I. (2002) \"Constructing a TpB Questionnaire: Conceptual and Methodological Considerations,\" Manuscript, University of Massachusetts, Available at URL: http://people.umass.edu/aizen/pdf/tpb.measurement.pdf.\n\nAldrich, H. and E. Auster (1986) \"Even Dwarfs Started Small: Liabilities of Age and Size and Their Strategic Implications,\" in Cummings, L. and B. Staw (Eds.) *Research in Organizational Behavior*, San Francisco, CA: JAI Press, pp. 165-198.\n\nAmabile, T.M., K.G. Hill, A. Hennessey, and E.M. Tighe (1994) \"The Work Preference Inventory: Assessing Intrinsic and Extrinsic Motivational Orientations,\" *Journal of Personality and Social Psychology* 66 (5), pp. 950-967.\n\nArmitage, C. and M. Conner (2001) \"The Theory of Planned Behavior,\" *British Journal of Social Psychology* 40 (4), pp. 471-499.\n\nBaldwin, C.Y. and K.B. Clark (2006) \"The Architecture of Participation: Does Code Architecture Mitigate Free Riding in the Open Source Development Model?,\" *Management Science* 52 (7), pp. 1116-1127.\n\nBanker, R.D., R.J. Kauffman, and D. Zweig (1993) \"Repository Evaluation on Software Reuse,\" *IEEE Transactions of Software Engineering* 19 (4), pp. 379-389.\n\nBonaccorsi, A., S. Giannangeli, and C. Rossi (2006) \"Entry Strategies under Competing Standards: Hybrid Business Models in the Open Source Software Industry,\" *Management Science* 52 (7), pp. 1085-1098.\n\nChang, H.-F.A. and A. Mockus (2008) \"Evaluation of Source Code Copy Detection Methods on FreeBSD,\" *International Working Conference on Mining Software Repositories*, Leipzig, Germany.\n\nChesbrough, H.W. (2003) *Open Innovation. The New Imperative for Creating and Profiting from Technology*. Boston, MA: Harvard Business School Press.\n\nClary, E.G., M. Snyder, R.D. Ridge, J. Copeland, A.A. Stukas, and J. Haugen (1998) \"Understanding and Assessing the Motivations of Volunteers: A Functional Approach,\" *Journal of Personality and Social Psychology* 74 (6), pp. 1516-1530.\n\nCook, T.D. and D.T. Campbell (1979) *Quasi-Experimentation: Design and Analysis Issues for Field Setting*. Chicago, IL: Rand McNally.\n\nCrowston, K. and B. Scozzi (2008) \"Bug Fixing Practices within Free/Libre Open Source Software Development Teams,\" *Journal of Database Management* 19 (2), pp. 1-30.\n\nCrowston, K., K. Wei, J. Howison, and A. Wiggins (2009) \"Free/Libre Open Source Software Development: What We Know and What We Do Not Know,\" (07.07.2009), Working Paper, Available at URL: http://floss.syr.edu/StudyP/Review%20Paper_070709.pdf.\n\nElectronic copy available at: https://ssrn.com/abstract=1489789\nCs\u00edkszentmih\u00e1lyi, M. (1990) *Flow: The Psychology of Optimal Experience*. New York, NY: Harper and Row.\n\nCusumano, M. and C. Kemerer (1990) \"A Quantitative Analysis of U.S. And Japanese Practice in Software Development,\" *Management Science* 36 (11), pp. 1384-1406.\n\nDahlander, L. (2005) \"Appropriation and Appropriability in Open Source Software,\" *International Journal of Innovation Management* 9 (3), pp. 259-285.\n\nDavis, F.D., R.P. Bagozzi, and R.P. Warshaw (1989) \"User Acceptance of Computer Technology: A Comparison of Two Theoretical Models,\" *Management Science* 35 (8), pp. 982-1002.\n\nDesouza, K.C., Y. Awazu, and A. Tiwana (2006) \"Four Dynamics for Bringing Use Back into Software Reuse,\" *Communications of the ACM* 49 (1), pp. 96-100.\n\nDiBona, C. (2005) \"Open Source and Proprietary Software Development,\" in DiBona, C., D. Cooper, and M. Stone (Eds.) *Open Source 2.0: The Continuing Evolution*, Sebastopol, CA: O'Reilly Media.\n\nDiBona, C., J. Ockerbloom, and M. Stone (1999) \"Introduction,\" in DiBona, C., S. Ockman, and M. Stone (Eds.) *Open Sources: Voices of the Open Source Revolution*, Sebastopol, CA: O'Reilly & Associates, pp. 1-17.\n\nFershtman, C. and N. Gandal (2009) \"R&D Spillovers: The 'Social Network' of Open Source,\" (16.05.2009), Working Paper, Available at URL: http://www.tau.ac.il/~gandal/OSS.pdf.\n\nFornell, C. and F. Larcker (1981) \"Evaluating Structural Equation Models with Unobservable Variables and Measurement Error,\" *Journal of Marketing Research* 13 (1), pp. 39-50.\n\nFrakes, W.B. and C.J. Fox (1995) \"Sixteen Questions About Software Reuse,\" *Communications of the ACM* 38 (6), pp. 75-87.\n\nFrakes, W.B. and K. Kang (2005) \"Software Reuse Research: Status and Future,\" *IEEE Transactions of Software Engineering* 31 (7), pp. 529 - 536.\n\nGerman, D.M. (2007) \"Using Software Distributions to Understand the Relationship among Free and Open Source Software Projects,\" *4th International Workshop on Mining Software Repositories*, Minneapolis, MN.\n\nGhosh, R.A., R. Glott, B. Krieger, and G. Robles (2002) \"Free/Libre and Open Source Software: Survey and Study - Deliverable D18: Final Report - Part IV: Survey of Developers,\" Available at URL: http://www.infonomics.nl/FLOSS/report/FLOSS_Final4.pdf.\n\nGruber, M. and J. Henkel (2005) \"New Ventures Based on Open Innovation - an Empirical Analysis of Start-up Firms in Embedded Linux,\" *International Journal of Technology Management* 33 (4), pp. 354-372.\n\nHaefliger, S., G. von Krogh, and S. Spaeth (2008) \"Code Reuse in Open Source Software,\" *Management Science* 54 (1), pp. 180-193.\n\nHair, J.F., Jr., R.L. Tataham, J.E. Anderson, and W. Black (2006) *Multivariate Data Analysis*. Upper Saddle River, NJ: Pearson Prentice Hall.\nHardgrave, B.C., F.D. Davis, and C.K. Riemenschneider (2003) \"Investigating Determinants of Software Developers' Intentions to Follow Methodologies,\" *Journal of Management Information Systems* 20 (1), pp. 123-151.\n\nHardgrave, B.C. and R.A. Johnson (2003) \"Toward an Information Systems Development Acceptance Model: The Case of Object-Oriented Systems Development,\" *IEEE Transactions on Engineering Management* 50 (3), pp. 322-336.\n\nHars, A. and S. Ou (2002) \"Working for Free? Motivations for Participating in Open-Source Projects,\" *International Journal of Electronic Commerce* 6 (3), pp. 25-39.\n\nHenkel, J. (2006) \"Selective Revealing in Open Innovation Processes: The Case of Embedded Linux,\" *Research Policy* 35 (7), pp. 953-969.\n\nHenkel, J. (2009) \"Champions of Revealing - the Role of Open Source Developers in Commercial Firms,\" *Industrial and Corporate Change* 18 (3), pp. 435-471.\n\nHenkel, J. and C.Y. Baldwin (2009) \"Modularity for Value Appropriation: Drawing the Boundaries of Intellectual Property,\" (March 2009), Working Paper, Harvard Business School.\n\nHertel, G., S. Niedner, and S. Hermann (2003) \"Motivation of Software Developers in the Open Source Projects: An Internet-Based Survey of Contributors to the Linux Kernel,\" *Research Policy* 32 (7), pp. 1159-1177.\n\nHerzberg, F. (1968) \"One More Time: How Do You Motivate Employees?,\" *Harvard Business Review* 46 (1), pp. 53-62.\n\nIsoda, S. (1995) \"Experience of a Software Reuse Project,\" *Journal of Systems and Software* 30, pp. 171-186.\n\nKim, Y.E. and E.A. Stohr (1998) \"Software Reuse: Survey and Research Directions,\" *Journal of Management Information Systems* 14 (4), pp. 113-147.\n\nKrueger, C.W. (1992) \"Software Reuse,\" *ACM Computer Surveys* 24 (2), pp. 131-183.\n\nLakhani, K.R. and E. von Hippel (2003) \"How Open Source Software Works: \"Free\" User-to-User Assistance,\" *Research Policy* 32 (6), pp. 923-943.\n\nLakhani, K.R. and R.G. Wolf (2005) \"Why Hackers Do What They Do: Understanding Motivation and Effort in Free/Open Source Software Projects,\" in Feller, J., B. Fitzgerald, S. Hissam, and K.R. Lakhani (Eds.) *Perspectives on Free and Open Source Software*, Cambridge, MA: MIT Press, pp. 3-22.\n\nLanglois, R.N. (1999) \"Scale, Scope, and the Reuse of Knowledge,\" in Dow, S.C. and P.E. Earl (Eds.) *Economic Organization and Economic Knowledge*, Cheltenham, UK: Edward Elgar, pp. 239-254.\n\nLee, N.-Y. and C.R. Litecky (1997) \"An Empirical Study of Software Reuse with Special Attention to Ada,\" *Transactions on Software Engineering* 23 (9), pp. 537-549.\n\nLerner, J. and J. Tirole (2002) \"Some Simple Economics of Open Source,\" *The Journal of Industrial Economics* 50 (2), pp. 197-234.\nMajchrak, A., L.P. Cooper, and O.P. Neece (2004) \"Knowledge Reuse for Innovation,\" *Management Science* 50 (2), pp. 174-188.\n\nMellarkod, V., R. Appan, D.R. Jones, and K. Sherif (2007) \"A Multi-Level Analysis of Factors Affecting Software Developers' Intention to Reuse Software Assets: An Empirical Investigation,\" *Information & Management* 44 (7), pp. 613-625.\n\nMockus, A. (2007) \"Large-Scale Code Reuse in Open Source Software,\" *1st International Workshop on Emerging Trends in FLOSS Research and Development*, Minneapolis, MN.\n\nMoore, G.C. and I. Benbasat (1991) \"Development of an Instrument to Measure the Perceptions of Adopting an Information Technology Innovation,\" *Information Systems Research* 2 (3), pp. 192-222.\n\nMorisio, M., M. Ezran, and C. Tully (2002) \"Success and Failure Factors in Software Reuse,\" *IEEE Transactions on Software Engineering* 28 (4), pp. 340-357.\n\nNaur, P. and B. Randell (1968) *Software Engineering; Report on a Conference by the Nato Science Committee*. Brussels, Belgium: NATO Science Affairs Division.\n\nNunnally, J.C. (1978) *Psychometric Theory*. New York, NY: McGraw-Hill.\n\nPodsakoff, P.M., S.B. MacKenzie, J. Lee, and N.P. Podsakoff (2003) \"Common Method Biases in Behavioral Research: A Critical Review of the Literature and Recommended Remedies,\" *Journal of Applied Psychology* 88 (5), pp. 879-903.\n\nRavichandran, T. and M.A. Rothenberger (2003) \"Software Reuse Strategies and Component Markets,\" *Communications of the ACM* 46 (8), pp. 109-114.\n\nRaymond, E.S. (2001) *The Cathedral and the Bazaar*. Sebastopol, CA: O'Reilly & Associates 2nd Edition.\n\nRiemenschneider, C.K. and B.C. Hardgrave (2001) \"Explaining Software Development Tool Use with the Technology Acceptance Model,\" *Journal of Computer Information Systems* 41 (4), pp. 1-8.\n\nRiemenschneider, C.K., B.C. Hardgrave, and F.D. Davis (2002) \"Explaining Software Developer Acceptance of Methodologies: A Comparison of Five Theoretical Models,\" *IEEE Transactions on Software Engineering* 28 (12), pp. 1135-1145.\n\nRoberts, J.A., I. Hann, and S.A. Slaughter (2006) \"Understanding the Motivations, Participation, and Performance of Open Source Software Developers: A Longitudinal Study of the Apache Projects,\" *Management Science* 52 (7), pp. 984-999.\n\nRossi Lamastra, C. (2009) \"Software Innovativeness: A Comparison between Proprietary and Free/Open Source Solutions Offered by Italian SMEs,\" *R&D Management* 39 (2), pp. 153-169.\n\nSen, A. (1997) \"The Role of Opportunism in the Software Design Reuse Process,\" *IEEE Transactions of Software Engineering* 23 (7), pp. 418-436.\n\nSen, R., C. Subramaniam, and M.L. Nelson (2008) \"Determinants of the Choice of Open Source Software License,\" *Journal of Management Information Systems* 25 (3), pp. 207-239.\n\nElectronic copy available at: https://ssrn.com/abstract=1489789\nSherif, K., R. Appan, and Z. Lin (2006) \"Ressources and Incentives for the Adoption of Systematic Software Reuse,\" *International Journal of Information Management* 26 (1), pp. 70-80.\n\nSpaeth, S., M. Stuermer, S. Haefliger, and G. Von Krogh (2007) \"Sampling in Open Source Software Development: The Case for Using the Debian GNU/Linux Distribution,\" *40th Annual Hawaii International Conference on System Sciences*, Waikoloa, HI.\n\nStewart, K.J. and S. Gosain (2006) \"The Impact of Ideology on Effectiveness in Open Source Software Teams,\" *MIS Quarterly* 30 (2), pp. 291-314.\n\nStraub, D. (1989) \"Validating Instruments in MIS Research,\" *MIS Quarterly* 13 (2), pp. 147-169.\n\nSubramanyam, R. and M. Xia (2008) \"Free/Libre Open Source Software Development in Developing and Developed Countries: A Conceptual Framework with an Exploratory Study,\" *Decision Support Systems* 46 (1), pp. 173-186.\n\nTracz, W. (1995) *Confessions of a Used Program Salesman: Institutionalizing Software Reuse*. Reading, MA: Addison-Wesley.\n\nvon Krogh, G., S. Spaeth, and S. Haefliger (2005) \"Knowledge Reuse in Open Source Software: An Exploratory Study of 15 Open Source Projects,\" *38th Annual Hawaii International Conference on System Sciences*, Big Island, HI.\n\nvon Krogh, G., S. Spaeth, S. Haefliger, and M. Wallin (2008) \"Open Source Software: What We Know (and Do Not Know) About Motives to Contribute,\" (April 2008), Working Paper, DIME Working Papers on Intellectual Property, Available at URL: http://www.dime-eu.org/files/active/0/WP38_vonKroghSpaethHaefligerWallin_IPROSS.pdf.\n\nvon Krogh, G., S. Spaeth, and K.R. Lakhani (2003) \"Community, Joining, and Specialization in Open Source Software Innovation: A Case Study,\" *Research Policy* 32 (7), pp. 1217-1241.\n\nWatson, S. and K. Hewett (2006) \"A Multi-Theoretical Model of Knowledge Transfer in Organizations: Determinants of Knowledge Contribution and Knowledge Reuse,\" *Journal of Management Studies* 43 (2), pp. 141-173.\n\nWest, J. (2003) \"How Open Is Open Enough? Melding Proprietary and Open Source Platform Strategies,\" *Research Policy* 32 (7), pp. 1259-1285.\n\nWu, C.-G., J.H. Gerlach, and C.E. Young (2007) \"An Empirical Analysis of Open Source Software Developers\u2019 Motivations and Continuance Intentions,\" *Information & Management* 44 (3), pp. 253-262.\n\nYe, Y. and G. Fischer (2005) \"Reuse-Conducive Development Environments,\" *Automated Software Engineering* 12 (2), pp. 199-235.\n### Appendix\n\n#### Table A1: Factor Analysis and Reliability of Developer Motivation Constructs\n\n| Construct/item | 1 | 2 | 3 | 4 | 5 | 6 | Cronbach\u2019s \u03b1 |\n|----------------|-------|-------|-------|-------|-------|-------|--------------|\n| **1. Challenge seeking** | | | | | | | 0.807 |\n| Chal1 | 0.052 | 0.794 | 0.137 | 0.203 | 0.007 | 0.043 | |\n| Chal2 | -0.031| 0.891 | 0.119 | 0.135 | 0.034 | 0.019 | |\n| Chal3 | 0.020 | 0.794 | 0.075 | 0.172 | -0.026| 0.026 | |\n| **2. Coding fun and enjoyment** | | | | | | | 0.746 |\n| Fun1 | 0.021 | 0.176 | 0.122 | 0.763 | -0.024| 0.111 | |\n| Fun2 | -0.008| 0.284 | 0.217 | 0.718 | 0.100 | 0.005 | |\n| Fun3 | 0.038 | 0.165 | 0.077 | 0.839 | 0.010 | 0.002 | |\n| **3. Community commitment** | | | | | | | 0.640 |\n| Com1 | -0.068| 0.043 | 0.109 | 0.055 | 0.154 | 0.743 | |\n| Com2 | 0.138 | 0.112 | 0.010 | 0.027 | -0.099| 0.691 | |\n| Com3 | -0.051| -0.017| 0.089 | 0.033 | 0.186 | 0.832 | |\n| **4. Skill improvement** | | | | | | | 0.758 |\n| Learn1 | 0.101 | 0.148 | 0.832 | 0.162 | 0.003 | 0.044 | |\n| Learn2 | 0.192 | 0.120 | 0.831 | 0.159 | 0.027 | 0.058 | |\n| Learn3 | 0.034 | 0.093 | 0.721 | -0.005| 0.190 | 0.125 | |\n| **5. OSS reputation building** | | | | | | | 0.901 |\n| OSSRep1 | 0.253 | -0.004| 0.053 | 0.035 | 0.892 | 0.098 | |\n| OSSRep2 | 0.240 | 0.021 | 0.055 | 0.010 | 0.900 | 0.091 | |\n| **6. Commercial signaling** | | | | | | | 0.866 |\n| ComSig1 | 0.847 | 0.004 | 0.178 | 0.065 | 0.095 | 0.019 | |\n| ComSig2 | 0.857 | -0.027| 0.087 | -0.007| 0.250 | -0.016| |\n| ComSig3 | 0.800 | 0.056 | 0.045 | -0.009| 0.359 | -0.031| |\n\nNotes: The factor analysis uses principal component analysis and Varimax rotation; high factor loadings under each component in the rotated matrix are indicated by bold text and gray shading.\n\nN=624.\n\n#### Table A2: Discriminant Analysis of Developer Motivation Constructs\n\n| Construct/item | 1 | 2 | 3 | 4 | 5 | 6 |\n|----------------|-------|-------|-------|-------|-------|-------|\n| **1. Challenge seeking** | | | | | | 0.757 |\n| **2. Coding fun and enjoyment** | 0.444***| | | | | 0.705 |\n| **3. Community commitment** | 0.112***| 0.132***| | | | 0.657 |\n| **4. Skill improvement** | 0.285***| 0.323***| 0.207***| | | 0.751 |\n| **5. OSS reputation building** | 0.033 | 0.064 | 0.194***| 0.189***| | 0.906 |\n| **6. Commercial signaling** | 0.047 | 0.063 | 0.026 | 0.254***| 0.495***| 0.832 |\n\nNotes: The diagonal bolded entries are square roots of the average variance extracted (AVE) of the respective construct; the off-diagonal entries are standardized correlations between constructs; * correlation significant at 10%; ** correlation significant at 5%; *** correlation significant at 1% level.\n\nN=624.\n\nElectronic copy available at: https://ssrn.com/abstract=1489789\n### Table A3: Exploratory Factor Analysis of Reuse Benefits\n\n| Item (Rank in Figure 2) | 1 | 2 | 3 | 4 |\n|----------------------------------|-------|-------|-------|-------|\n| Difficult Problem (Rank 3) | 0.081 | 0.171 | 0.090 | **0.948** |\n| Faster (Rank 1) | 0.181 | **0.793** | -0.001 | 0.326 |\n| Most Important (Rank 2) | 0.176 | **0.834** | 0.236 | 0.062 |\n| Most Fun (Rank 6) | -0.021 | 0.414 | **0.743** | 0.021 |\n| Outs Maintenance (Rank 7) | 0.332 | -0.029 | **0.779** | 0.162 |\n| Reliable SW (Rank 4) | **0.840** | 0.278 | 0.130 | -0.031 |\n| Secure SW (Rank 8) | **0.872** | 0.124 | 0.113 | 0.090 |\n| Standard SW (Rank 5) | **0.739** | 0.002 | 0.097 | 0.237 |\n\nNotes: The factor analysis uses principal component analysis and Varimax rotation; high factor loadings under each component in the rotated matrix are indicated by bold text and gray shading.\n\nN=624.\n\n### Table A4: Exploratory Factor Analysis of Reuse Issues and Drawbacks\n\n| Item (Rank in Figure 3) | 1 | 2 | 3 |\n|----------------------------------|-------|-------|-------|\n| Finding (Rank 9) | **0.854** | 0.089 | 0.036 |\n| Understanding (Rank 7) | **0.876** | 0.125 | 0.073 |\n| Adapting (Rank 6) | **0.847** | 0.165 | 0.087 |\n| Quality Risks (Rank 5) | 0.156 | **0.934** | 0.100 |\n| Security Risks (Rank 4) | 0.088 | **0.935** | 0.084 |\n| Performance Loss (Rank 8) | 0.231 | **0.451** | 0.284 |\n| Installation (Rank 2) | 0.152 | 0.089 | **0.764** |\n| Dependence (Rank 1) | -0.051 | 0.118 | **0.785** |\n| Additional Work (Rank 3) | 0.162 | 0.162 | **0.707** |\n\nNotes: The factor analysis uses principal component analysis and Varimax rotation; high factor loadings under each component in the rotated matrix are indicated by bold text and gray shading. *The loading of this item on its construct is rather low, however, it is retained due to the good overall Cronbach\u2019s $\\alpha$ of the construct (0.76).\n\nN=624.\nTable A5: Descriptive Statistics of Explanatory Variables Used in Table 6\n\n| Variable | Dummy variable equal to \u201c1\u201d if\u2026 | Frequency of \u201c0\u201d | Frequency of \u201c1\u201d |\n|-------------------|----------------------------------|------------------|------------------|\n| ProjPolSupport | Developer\u2019s current main project has a policy encouraging its developers to reuse | 438 (70%) | 186 (30%) |\n| ProjPolDiscourage | Developer\u2019s current main project has a policy discouraging its developers from reuse | 606 (97%) | 18 (3%) |\n| ProjStandalone | Developer\u2019s current main project is a standalone executable application project and not a component project | 162 (26%) | 462 (74%) |\n| DevProf | Developer is working as professional developer or has worked as professional developer for a firm | 191 (31%) | 433 (69%) |\n| DevEduReuse | Developer has received training on reuse during her education | 412 (66%) | 212 (34%) |\n| DevProfEduReuse | Developer has received training on reuse when working as software developer for a firm | 544 (87%) | 80 (13%) |\n| Residence-N.America | Developer resides in North America | 455 (73%) | 169 (27%) |\n| Residence-S.America | Developer resides in South America | 594 (95%) | 30 (5%) |\n| Residence-Asia&RoW | Developer resides Asia, Africa, Australia or Oceania | 536 (86%) | 88 (14%) |\n\n| Variable | Explanation | Min. | Max. | Med. | Mean | S.D. |\n|-------------------|-----------------------------------------------------------------------------|-------|-------|-------|-------|-------|\n| Benefit-Effectiveness | Factor score from exploratory factor analysis\u2026 on developer\u2019s perception of effectiveness effects of code reuse | -4.762 | 2.047 | 0.178 | 0 | 1 |\n| Benefit-Efficiency | \u2026on developer\u2019s perception of efficiency effects of code reuse | -3.568 | 2.313 | 0.093 | 0 | 1 |\n| BenefitQuality | \u2026on developer\u2019s perception of quality effects of code reuse | -3.972 | 2.909 | -0.027| 0 | 1 |\n| Benefit-TaskSelection | \u2026on developer\u2019s perception of task selection effects of code reuse | -3.884 | 3.026 | 0.033 | 0 | 1 |\n| Issue-ControlLoss | \u2026on developer\u2019s perception of control loss effects of code reuse | -3.781 | 2.376 | 0.065 | 0 | 1 |\n| DevOSS-Netsize (log) | Size of developer\u2019s personal OSS network (as logarithm) | 0 | 6.217 | 2.197 | 2.001 | 1.033 |\n| DevOtherProjects | Number of OSS projects besides current main project, that developer has ever been involved in | 0 | 48 | 2 | 3.617 | 5.388 |\n| ProjPhase | Development phase of developer\u2019s current main project (1=Pre-Alpha, 2=Alpha, 3=Beta, 4=Stable/Production, 5=Mature) | 1 | 5 | 3 | 3.221 | 1.184 |\n| MotChallenge | Index variable constructed from challenge scale (1=Strongly disagree,\u2026, 7=Strongly agree) | 1 | 7 | 5.333 | 5.128 | 1.060 |\n| MotFun | Index variable constructed from fun scale (1=Strongly disagree,\u2026, 7=Strongly agree) | 1.667 | 7 | 5.000 | 5.152 | 1.092 |\n| MotLearning | Index variable constructed from learning scale (1=Strongly disagree,\u2026, 7=Strongly agree) | 1 | 7 | 5.333 | 5.317 | 1.100 |\n\nElectronic copy available at: https://ssrn.com/abstract=1489789\n| Variable | Description | Mean | SD | Median | N |\n|--------------------------|-----------------------------------------------------------------------------|------|-----|--------|----|\n| Mot-Community | Index variable constructed from community commitment scale (1=Strongly disagree,\u2026, 7=Strongly agree) | 1 | 7 | 5.667 | 5.614 1.003 |\n| MotOSS-Reputation | Index variable constructed from OSS reputation scale (1=Strongly disagree,\u2026, 7=Strongly agree) | 1 | 7 | 4.000 | 3.609 1.621 |\n| MotSignaling | Index variable constructed from signaling scale (1=Strongly disagree,\u2026, 7=Strongly agree) | 1 | 7 | 4.667 | 4.312 1.527 |\n| DevNorm | Index variable constructed from subjective norms scale (1=Strongly disagree,\u2026, 7=Strongly agree) | 1 | 7 | 4.000 | 3.927 1.555 |\n| ConditionLack | Developer\u2019s agreement (1=Strongly disagree,\u2026, 7=Strongly agree) to\u2026 lack of reusable code as impediment to reuse | 1 | 7 | 4 | 3.784 1.823 |\n| Condition-License | \u2026 issues with license incompatibilities as impediment to reuse | 1 | 7 | 2 | 3.006 1.852 |\n| Condition-Language | \u2026 issues with programming language incompatibilities as impediment to reuse | 1 | 7 | 2 | 2.154 1.401 |\n| Condition-Architecture | \u2026 issues with project architecture as impediment to reuse | 1 | 7 | 2 | 2.630 1.597 |\n| DevSkill | Self-assessment of developer\u2019s software development skills compared to the average OSS developer (1=Much worse,\u2026, 5=Much better) | 1 | 5 | 3 | 3.269 0.989 |\n| ProjSize | Size of developer\u2019s current main project in number of developers | 1 | 999*| 2 | 6.091 44.420 |\n| Proj-Complexity | Complexity of developer\u2019s current main project compared to average project on SourceForge.net (1=Much less complex,\u2026, 5=More more complex) | 1 | 5 | 3 | 2.947 1.029 |\n| ProjStack | Position of developer\u2019s current main project in software stack (1=Very low,\u2026, 5=Very high) | 1 | 5 | 4 | 3.333 0.921 |\n| DevOSS-Experience | Number of years developer has been active working on OSS projects | 1 | 40**| 5 | 5.668 4.709 |\n| DevProjTime | Average weekly hours developer works on her current main project | 0.5 | 58 | 5 | 8.775 10.723 |\n| DevProjShare | Share of work that has been done by developer in her current main project as opposed to other project team members | 5 | 100 | 90 | 67.436 36.998 |\n\n*The main project of this developer is Linux where a very high number of project team members seems reasonable.\n\n**This developer claims to have been involved in OSS even before it got started. We assume that she implies that she has already been working on a project that later became OSS at that point in time.\n\nN=624.\n| | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |\n|---|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|\n| 1 | BenefitEffectiveness | 1.00 | | | | | | | | | | | | | | | | |\n| 2 | BenefitEfficiency | n.m. | 1.00| | | | | | | | | | | | | | | |\n| 3 | BenefitQuality | n.m. | n.m.| 1.00| | | | | | | | | | | | | | |\n| 4 | BenefitTaskSelection | n.m. | n.m.| n.m.| 1.00| | | | | | | | | | | | | |\n| 5 | IssueControlLoss | n.m. | n.m.| n.m.| n.m.| 1.00| | | | | | | | | | | | |\n| 6 | DevOSSNetsize | 0.14 | 0.11| | | | 1.00| | | | | | | | | | | |\n| 7 | DevOtherProjects | -0.08| 0.31| 1.00| | | | | | | | | | | | | | |\n| 8 | ProjPhase | 0.07 | -0.10| 0.17| 0.17| 1.00| | | | | | | | | | | | |\n| 9 | MotChallenge | -0.08| | | | | | 1.00| | | | | | | | | | |\n|10 | MotFun | 0.08 | | | | | | | 0.09| -0.08| 0.44| 1.00| | | | | | |\n|11 | MotLearning | 0.08 | | | | | | | 0.16| -0.09| -0.12| 0.29| 0.32| 1.00| | | | |\n|12 | MotCommunity | 0.09 | 0.14| 0.14| -0.07| 0.22| 0.13| 0.10| 0.11| 0.13| 0.21| 1.00| | | | | | |\n|13 | MotOSSReputation | 0.15 | 0.10| -0.08| 0.13| 0.15| 0.09| | 0.19| 0.19| 1.00| | | | | | | |\n|14 | MotSignaling | 0.07 | 0.16| | | | | | | | | | 0.25| 0.50| 1.00| | | |\n|15 | DevNorm | 0.07 | 0.19| 0.26| 0.21| 0.09| 0.07| 0.10| 0.12| 0.18| 0.12| 1.00| | | | | | |\n|16 | DevSkill | 0.12 | 0.07| 0.13| 0.10| 0.16| 0.15| 0.09| | | | | | | | | | |\n|17 | ProjPolSupport | 0.10 | 0.09| 0.23| 0.19| 0.15| 0.10| 0.09| 0.19| 0.11| 0.12| 0.12| 1.00| | | | | |\n|18 | ProjPolDiscourage | -0.09| 0.12| -0.07| -0.08| -0.07| 0.09| -0.11| 1.00| | | | | | | | | |\n|19 | ConditionLack | -0.08| -0.19| -0.08| | | | | | | | | | | | | | |\n|20 | ConditionLicense | -0.10| -0.15| | | | | | | | | | | | | | | |\n|21 | ConditionLanguage | -0.23| | | | | | | | | | | | | | | | |\n|22 | ConditionArchitecture| -0.16| -0.08| 0.07| -0.07| | | | | | | | | | | | | |\n|23 | ProjSize | -0.07| -0.08| 0.19| 0.11| 0.09| 0.09| 0.08| 0.11| | | | | | | | | |\n|24 | ProjComplexity | -0.11| 0.12| 0.09| 0.18| 0.19| 0.21| 0.09| 0.11| 0.38| 0.30| | | | | | | |\n|25 | ProjStack | 0.15 | | | | | | | | | | | | | | | | |\n|26 | ProjStandalone | 0.07 | | | | | | | | | | | | | | | | |\n|27 | DevOSSExperience | 0.13 | -0.08| 0.26| 0.29| 0.29| -0.15| 0.12| -0.13| 0.25| 0.09| | | | | | | |\n|28 | DevProjTime | -0.09| 0.13| 0.13| 0.11| 0.12| 0.11| 0.10| 0.21| 0.09| 0.17| 0.29| | | | | | |\n|29 | DevProjShare | -0.21| -0.08| -0.22| 0.07| | | | | | | | | | | | | |\n|30 | DevEduReuse | -0.09| -0.11| | | | | | | | | | | | | | | |\n|31 | DevProfEduReuse | 0.08 | | | | | | | | | | | | | | | | |\n|32 | DevProf | 0.13 | 0.11| 0.08| 0.08| -0.08| -0.10| -0.14| 0.07| 0.15| 0.08| 0.39| 0.07| | | | | |\n|33 | Residence-N. America | 0.07 | | | | | | | | | | | | | | | | |\n|34 | Residence-S. America | 0.07 | | | | | | | | | | | | | | | | |\n|35 | Residence-Asia & RoW | -0.08| -0.09| | | | | | | | | | | | | | | | |\n| | 18. ProjPolDiscourage | 19. ConditionLack | 20. ConditionLicense | 21. ConditionLanguage | 22. ConditionArchitecture | 23. ProjSize | 24. ProjComplexity | 25. ProjStack | 26. ProjStandalone | 27. DevOSSExperience | 28. DevProjTime | 29. DevProjShare | 30. DevEduReuse | 31. DevProfEduReuse | 32. DevProf | 33. Residence-N. America | 34. Residence-S. America | 35. Residence-Asia & RoW |\n|---|----------------------|------------------|---------------------|----------------------|--------------------------|-------------|------------------|-------------|------------------|------------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|\n| 18. ProjPolDiscourage | | | | | | | | | | | | | | | | | |\n| 19. ConditionLack | 1.00 | | | | | | | | | | | | | | | | |\n| 20. ConditionLicense | 0.17 | 1.00 | | | | | | | | | | | | | | | |\n| 21. ConditionLanguage | 0.23 | 0.24 | 1.00 | | | | | | | | | | | | | | |\n| 22. ConditionArchitecture | 0.24 | 0.11 | 0.33 | 1.00 | | | | | | | | | | | | | |\n| 23. ProjSize | 0.08 | 1.00 | | | | | | | | | | | | | | | |\n| 24. ProjComplexity | -0.09 | 0.18 | -0.09 | 0.16 | 1.00 | | | | | | | | | | | | |\n| 25. ProjStack | -0.13 | -0.07 | 0.07 | -0.11 | 1.00 | | | | | | | | | | | | |\n| 26. ProjStandalone | -0.101 | 0. | 0.13 | 0.37 | 1.00 | | | | | | | | | | | | |\n| 27. DevOSSExperience | 0.13 | 0.07 | 0.23 | -0.08 | 1.00 | | | | | | | | | | | | |\n| 28. DevProjTime | -0.08 | 0.14 | 0.15 | 0.38 | 0.11 | 1.00 | | | | | | | | | | | |\n| 29. DevProjShare | -0.20 | -0.08 | -0.08 | -0.17 | -0.34 | -0.10 | -0.15 | 1.00 | | | | | | | | | |\n| 30. DevEduReuse | -0.07 | 1.00 | | | | | | | | | | | | | | | |\n| 31. DevProfEduReuse | 0.09 | 0.08 | 1.00 | | | | | | | | | | | | | | |\n| 32. DevProf | -0.12 | -0.08 | 0.08 | -0.09 | -0.14 | 0.09 | 0.18 | 0.25 | 1.00 | | | | | | | | |\n| 33. Residence-N. America | 0.15 | 0.09 | 1.00 | | | | | | | | | | | | | | |\n| 34. Residence-S. America | -0.08 | 0.08 | n.m. | 1.00 | | | | | | | | | | | | | |\n| 35. Residence-Asia & RoW | n.m. | n.m. | 1.00 | | | | | | | | | | | | | | |\n\nNotes: Only correlations with p<0.1 are shown; n.m. = not meaningful because variables are dummy variables coding the same characteristic or are scores of the same exploratory factor analysis.\n| Table A7: Multivariate Analysis of Developers' Reuse Behavior \u2013 Robustness Check |\n|---------------------------------|-----------------|-----------------|-----------------|\n| | (4) Likert scale | (5) Percentage scale | (6) Future importance of reuse (Likert scale) |\n| **Attitude toward reuse** | | | |\n| BenefitEffectiveness (H1a) | 0.220*** (0.076) | 2.464** (1.010) | 0.146** (0.062) |\n| BenefitEfficiency (H1b) | 0.634*** (0.080) | 6.047*** (1.059) | 0.499*** (0.066) |\n| BenefitQuality (H1c) | 0.322*** (0.079) | 2.262** (1.048) | 0.273*** (0.065) |\n| BenefitTaskSelection (H1d) | 0.157** (0.077) | 3.368*** (1.026) | 0.144** (0.064) |\n| IssueControlLoss (H1e) | | | |\n| **Access to local search** | | | |\n| DevOSSNetsize (log) (H2a) | 0.172** (0.080) | 2.307** (1.047) | 0.246*** (0.066) |\n| DevOtherProjects (H2b) | 0.030* (0.015) | 0.465** (0.196) | 0.034*** (0.013) |\n| **Project maturity** | | | |\n| ProjPhase (H3) | -0.124* (0.066) | -2.984*** (0.871) | -0.204*** (0.054) |\n| **Compatibility with project goals** | | | |\n| MotChallenge (H4a) | | -2.466** (0.962) | |\n| MotFun (H4b) | | | |\n| MotLearning (H4c) | | | |\n| MotCommunity (H4d) | 0.180** (0.081) | 1.912* (1.067) | 0.163** (0.066) |\n| MotOSSReputation (H4e) | | | |\n| MotSignaling (H4f) | | | |\n| **Subjective norms** | | | |\n| DevNorm | 0.120* (0.065) | 2.133** (0.870) | 0.205*** (0.054) |\n| **Perceived behavioral control**| | | |\n| ProjPolSupport | 0.405** (0.180) | 0.335** (0.143) | |\n| ProjPolDiscourage | -1.210*** (0.447)| -1.299*** (0.375)| |\n| ConditionLack | -0.236*** (0.042)| -2.355*** (0.564)| -0.160*** (0.035)|\n| ConditionLicense | | | |\n| ConditionLanguage | | | |\n| ConditionArchitecture | | | |\n| DevSkill | | | |\n| **Further control variables** | | | |\n| ProjSize | | | |\n| ProjComplexity | | | |\n| ProjStack | 0.232*** (0.083) | 0.172** (0.069) | |\n| ProjStandalone | | | |\n| DevOSSExperience | | | |\n| DevProjTime | 0.016** (0.007) | | |\n| DevProjShare | | | |\n| DevProf | | | |\n| DevEduReuse | | | |\n| DevProfEduReuse | 0.573** (0.232) | 5.581* (3.012) | 0.414** (0.189) |\n| Residence-N. America | | | |\n| Residence-S. America | | | |\n| Residence-Asia & RoW | | | |\n| Constant | 3.145*** (0.622) | 34.228*** (8.393)| 2.858*** (0.509) |\n| **Observations** | 624 | 624 | 624 |\n| **Pseudo R\u00b2** | 0.101 | 0.026 | 0.112 |\n| **Likelihood ratio** | | | |\n| $\\chi^2(15)=252.81, p<0.0001$ | | | |\n| $\\chi^2(12)=149.36, p<0.0001$ | | | |\n| $\\chi^2(14)=272.67, p<0.0001$ | | | |\n| $\\sigma$ | 1.814 | 24.600 | 1.514 |\n\nNotes: All models are Tobit models; standard errors in parentheses; * significant at 10%; ** significant at 5%; *** significant at 1%. Eliminated variables are also jointly insignificant.\n\nElectronic copy available at: https://ssrn.com/abstract=1489789", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/016-sojer-henkel.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 46, "total-input-tokens": 104794, "total-output-tokens": 35265, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 2095, 1], [2095, 4729, 2], [4729, 7241, 3], [7241, 9528, 4], [9528, 12001, 5], [12001, 14459, 6], [14459, 16811, 7], [16811, 19307, 8], [19307, 20440, 9], [20440, 22592, 10], [22592, 25271, 11], [25271, 27569, 12], [27569, 30155, 13], [30155, 32656, 14], [32656, 34590, 15], [34590, 37336, 16], [37336, 40196, 17], [40196, 43135, 18], [43135, 45443, 19], [45443, 48549, 20], [48549, 51096, 21], [51096, 54268, 22], [54268, 55865, 23], [55865, 58581, 24], [58581, 61144, 25], [61144, 64062, 26], [64062, 71751, 27], [71751, 74160, 28], [74160, 76793, 29], [76793, 79345, 30], [79345, 81788, 31], [81788, 84191, 32], [84191, 86852, 33], [86852, 89139, 34], [89139, 91839, 35], [91839, 94556, 36], [94556, 97229, 37], [97229, 100055, 38], [100055, 102483, 39], [102483, 105998, 40], [105998, 108120, 41], [108120, 111498, 42], [111498, 114512, 43], [114512, 119392, 44], [119392, 121944, 45], [121944, 127115, 46]]}}
|
|
{"id": "a91b331428ccf4b3945820855761059f1f6800b7", "text": "What to Expect from Code Review Bots on GitHub? A Survey with OSS Maintainers\n\nMairieli Wessel \nmairieli@ime.usp.br \nUniversity of S\u00e3o Paulo\n\nAlexander Serebrenik \na.serebrenik@tue.nl \nEindhoven University of Technology\n\nIgor Wiese \nigor@utfpr.edu.br \nUniversidade Tecnol\u00f3gica Federal do Paran\u00e1\n\nIgor Steinmacher \nigor.steinmacher@nau.edu \nNorthern Arizona University\n\nMarco A. Gerosa \nmarco.gerosa@nau.edu \nNorthern Arizona University\n\nABSTRACT\nSoftware bots are used by Open Source Software (OSS) projects to streamline the code review process. Interfacing between developers and automated services, code review bots report continuous integration failures, code quality checks, and code coverage. However, the impact of such bots on maintenance tasks is still neglected. In this paper, we study how project maintainers experience code review bots. We surveyed 127 maintainers and asked about their expectations and perception of changes incurred by code review bots. Our findings reveal that the most frequent expectations include enhancing the feedback bots provide to developers, reducing the maintenance burden for developers, and enforcing code coverage. While maintainers report that bots satisfied their expectations, they also perceived unexpected effects, such as communication noise and newcomers\u2019 dropout. Based on these results, we provide a series of implications for bot developers, as well as insights for future research.\n\nCCS CONCEPTS\n\u2022 Human-centered computing \u2192 Open source software; \u2022 Software and its engineering \u2192 Software creation and management.\n\nKEYWORDS\nsoftware bots, pull-based model, open source software, code review\n\nACM Reference Format:\nMairieli Wessel, Alexander Serebrenik, Igor Wiese, Igor Steinmacher, and Marco A. Gerosa. 2020. What to Expect from Code Review Bots on GitHub? A Survey with OSS Maintainers. In 34th Brazilian Symposium on Software Engineering (SBES \u201920), October 21\u201323, 2020, Natal, Brazil. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3422392.3422459\n\n1 INTRODUCTION\nCode review is a software quality assurance practice [8] common in Open Source Software (OSS) projects [3]. Since open source development involves a community of geographically dispersed developers [23], projects are often hosted on social coding platforms, such as GitHub [7]. To receive external contributions, repositories are shared by fork, and modified by pull requests. In the pull-based development model, project maintainers spend a non-negligible time inspecting code changes and engaging in discussion with contributors to understand and improve the modifications before integrating them into the codebase [15, 33]\n\nOpen source software communities use software bots to assist and streamline the code review process [9, 29]. In short, bots are software applications that integrate with human tasks, serving as interfaces that connect developers and other tools [26], and providing additional value to human users [12]. Accomplishing tasks that were previously performed solely by human developers, and interacting in the same communication channels as their human counterparts, bots have become new voices in the code review conversation [17]. According to Wessel et al. [29], code review bots differ from other bots by guiding contributors to provide necessary information before maintainers review the pull requests. On GitHub, these bots are responsible for leaving comments on pull requests, reporting continuous integration failures, code quality checks, and code coverage.\n\nIn theory, the automation provided by these bots should save maintainers effort and time [25], and lead them to focus on higher priority aspects of code review [2]. Nevertheless, the adoption of a code review bot, similar to any technological adoption, can bring unexpected consequences. Since, according to Mulder et al. [18], many effects are not directly caused by the new technology itself, but by the changes in human behavior that it provokes, it is important to assess and discuss the effects of new technology. In the case of the effect of software bots on project maintainers, this is often neglected.\n\nIn this paper, we aim to understand why open source maintainers integrate code review bots into the pull request workflow and how they perceive the changes these bots induce. In short, we answer the following research questions:\n\nRQ1. What motivates maintainers to adopt code review bots?\nRQ2. How do maintainers perceive the changes code review bots introduce to the software process?\n\nTo achieve our goal, we conducted a survey with 127 maintainers of OSS projects hosted on GitHub that adopted code review bots. We investigate the maintainers\u2019 perceptions on whether project activity indicators change after bot adoption, such as the number of pull\nrequests received, merged, and non-merged, number of comments, and the time to close pull requests.\n\nAnalyzing the survey results, we found that maintainers were predominantly motivated by reducing their effort on tedious tasks to allow them to focus on more interesting ones, and enhancing the feedback communicated to developers. Regarding the changes introduced by the bot, we noted that less manual effort was required after adoption, a high-quality code was enforced, and pull request review sped up. However, four maintainers also reported unexpected aspects of bot adoption, including communication noise, more time spent on tests, newcomers\u2019 dropout, and bots impersonating maintainers, which stressed out contributors.\n\nOur contributions are twofold: (i) a set of maintainers\u2019 motivations for using a bot to assist the code review process; and (ii) a discussion of how maintainers see the impact of bot introduction and support. These contributions may help maintainers anticipate bots\u2019 effects on a project, and guide bot developers to consider the implications of new bots as they design them. Our findings, while preliminary, can suggest research hypotheses on the impact of code review bots on the code review process in open source projects, which follow-up studies can support or refute.\n\n2 BACKGROUND AND RELATED WORK\n\nSoftware bots have been designed to assist with the technical and social aspects of software development activities [13], including communication and decision-making [25]. Basically, these bots act as a conduit between software developers and other tools [25]. Wessel et al. have shown that bot adoption is indeed widespread in OSS projects hosted on GitHub [29]. GitHub bots have been developed to be integrated into the pull request workflow to perform a variety of tasks beyond code review support [31]. These tasks include repairing bugs [17, 27, 28], refactoring the code [32], recommending tools [4], detecting duplicated development [20], updating dependencies [16], and fixing static analysis violations [5].\n\nDespite their increasing popularity, understanding the effects of bots is a major challenge. Storey and Zagalsky [25] and Paikari and van der Hoek [19] highlight that the potential negative impact of task automation through bot technology is still neglected. While bots are often used to avoid interruptions to developers\u2019 work, they may lead to other, less obvious distractions [25]. Additionally, Liu et al. [14] claim that bots may have negative impacts on the user experience of open source contributors, since the needs and preferences of maintainers and contributors are not the same. While previous studies provide recommendations on how to evaluate bots\u2019 capabilities and performance [1, 4], they do not draw attention to the impact of bot adoption on software development or on how software engineers perceive the bots\u2019 effects.\n\nWessel et al. [29] investigated the usage and impact of software bots to support contributors and maintainers with pull requests. After identifying bots on popular GitHub repositories, the authors classified these bots into 13 categories according to the tasks they perform. The third most frequently used bots are code review bots. Wessel et al. [30] also employed a regression discontinuity design on OSS projects, revealing that the bot adoption increases the number of monthly merged pull requests, decreases monthly non-merged pull requests, and decreases communication among developers.\n\nPrior work has also investigated the impact of continuous integration (CI) and code review tools on GitHub projects [6, 11, 34]. While Zhao et al. [34] and Cassee et al. [6] investigated the impact of the Travis CI tool\u2019s introduction on development practices, Kavaler et al. [11] turned to the impact of linters, dependency managers, and coverage reporter tools. Our work extends the literature by providing an understanding of why code review bots are being adopted and the effects of such adoption, focusing on the perceptions of open source maintainers.\n\n3 STUDY METHODOLOGY\n\nWe conducted a survey to obtain insights on how open source maintainers perceive the impact of using code review bots on pull requests and the effects of these bots on the project activities.\n\n3.1 Survey Design\n\nWe first identified OSS projects hosted on GitHub that at some point had adopted at least one code review bot [29]. To find these projects, we queried the GHTorrent dataset [10], searching for projects that had received comments on pull requests from any of the code review bots identified by Wessel et al. [29]. For each project, we determined when a bot was introduced based on the date of the bot\u2019s first comment. Afterwards, we contacted maintainers who merged more than one pull request before and after the bot adoption. To avoid duplicate invitations, we kept only the first record of maintainers who appeared in more than one project. Our initial target population comprised 1,960 maintainers of projects that adopted code review bots and made their e-mail addresses publicly available via the GitHub API.\n\nTo increase survey participation, we followed the best practices described by Smith et al. [21], such as sending personalized invitations and allowing participants to remain anonymous. The survey was set up as an online questionnaire, and it was sent on September 18, 2019. We received answers for 3 months and sent a reminder on October 2019. Participation was voluntary, and the estimated time to complete the survey was 10 minutes. We received answers from 127 maintainers, while the delivery of 26 messages failed. For this survey, we had a response rate of $\\approx 6.55\\%$, which is consistent with other studies in software engineering [22].\n\nOur maintainers\u2019 survey had three main questions, which we made publicly available.1 In summary, we asked maintainers about their expectations and perception of changes caused by the adoption of a code review bot. Regarding the changes in the software process level, we asked maintainers about the same activity indicators studied by Wessel et al. [29]: the number of opened, merged, and non-merged pull requests, number of comments, and the time to close pull requests.\n\n3.2 Data analysis\n\nWe used a card sorting approach [35] to qualitatively analyze the answers to the open-ended questions Q1 and Q3. Two researchers conducted card sorting in two steps. In the first step, each researcher analyzed the answers (cards) independently and applied codes to each answer, sorting them into meaningful groups. This step was followed by a discussion meeting until reaching a consensus on the\n\n1https://zenodo.org/record/3992379#.Xz1_iSlKg3E\nTable 1: Reasons for adoption of code review bots\n\n| Reasons | # of answers (%) |\n|--------------------------------|------------------|\n| Enhance feedback to developers | 31 (24.4%) |\n| Reduce maintainers effort | 30 (23.6%) |\n| Enforce high code coverage | 22 (17.3%) |\n| Automate routine tasks | 20 (15.7%) |\n| Ensure high-quality standards | 20 (15.7%) |\n| Detect change effects | 7 (5.5%) |\n| Curiosity | 5 (3.9%) |\n| Improve interpersonal communication | 5 (3.9%) |\n| Lack of available tools | 5 (3.9%) |\n| Outside contributor\u2019s suggestion | 2 (1.6%) |\n\ncode names and categorization of each item. At the end of this process, the answers were sorted into high-level groups. In the second step, the researchers analyzed the categories, aiming to refine the classification and group-related codes into more significant, higher-level categories and themes. We used open card sorting, meaning we had no predefined codes or groups; the codes emerged and evolved during the analysis process. In addition, we quantitatively analyzed closed-ended question (Q2) to understand developers\u2019 perceptions of the impact of bots on pull requests.\n\n4 RESULTS\n\nIn this section, we report our main findings.\n\n4.1 Maintainers\u2019 Motivations to Adopt a Code Review Bot\n\nWe asked maintainers what made them decide to start using bots to support code review activities. Four participants (3.15%) did not report any reason. The other answers were grouped into 10 categories, as can be seen in Table 1.\n\nFrom the maintainers\u2019 perspective, the most recurrent motivation relates to enhancing the feedback to developers (31 mentions). This category includes cases in which the respondents\u2019 desired to see both code review metrics and additional information \u201cin a pretty and automated fashion\u201d and \u201cwithout having to go to another tool.\u201d Several respondents recognized the value of bot feedback for both reviewers and contributors: \u201cbots write useful information as comments and you can analyze it without switching the context.\u201d In addition, other respondents pointed out the importance of \u201cgiving uniform feedback to all contributors\u201d and \u201clet[ting] contributors see how they affect the code.\u201d Another two respondents mentioned that this kind of feedback might also increase contributors\u2019 public accountability, giving reviewers \u201cconfidence that the author cares about testing\u201d and about the quality of the code contribution.\n\nAnother recurrent reason regards reducing maintainers\u2019 effort (30 mentions). Several maintainers were motivated by the necessity to save time and reduce their own effort during the code review process. Most of them said that reducing maintainers\u2019 effort on trivial tasks, such as finding syntax errors and checking code style and coverage requirements, allows them to \u201cspend more time on the important parts.\u201d Moreover, the feedback provided by a code review bot helps maintainers avoid \u201crepeating the same comments for each pull request.\u201d\n\nWith 22 mentions, enforcing high code coverage during the code review process was the third most common reason. In general, respondents mentioned that code review bots were adopted to help detect and prevent reduction in code coverage. They also mentioned that these bots \u201censure good coverage to allow changes on the code base with high confidence that the project will continue to function as expected\u201d since they \u201cdon\u2019t want to drop (significantly) in coverage.\u201d Respondents (20) also reported another related reason: ensure high-quality standards. Respondents said that using code review bots for \u201cautomating repetitive tasks ensures they get done, increasing code quality\u201d and \u201creduce[s] the risk of bugs being missed by reviewers.\u201d\n\nSeveral maintainers (20) were also motivated by automating routine tasks that previously were manually performed. Respondents mentioned the desire to automate routine tasks in order to structure the process of code review and \u201cmake the process more repeatable.\u201d The routine tasks include tracking the coverage and \u201cautomatically upload[ing] code coverage results to a 3rd-party service.\u201d Others provided more generic answers, briefly mentioning \u201cautomation.\u201d\n\nMaintainers were also motivated by curiosity to test a new technological tool and by a suggestion of an outside contributor. In the other five cases, our respondents were motivated by improving interpersonal communication, since \u201can automatic answer by a bot isn\u2019t taken personally\u201d and \u201cit is a friendly way to ensure quality.\u201d Moreover, a code review bot \u201cimproves interpersonal communication on pull requests and thus may reduce the chance a pull request is abandoned by the author.\u201d\n\nAnswer to RQ1. Maintainers reported 10 reasons for using code review bots. We found that several maintainers were motivated by enhancing the feedback to developers (24.4%), reducing their own efforts (23.6%), and enforcing high code coverage (17.3%).\n\n4.2 Maintainers\u2019 Perceptions of Bots Effects\n\nWe also asked maintainers about their perspective on the potential changes to their projects that the code review bot introduced. The answers followed a 5-point Likert scale with neutral, ranging from \u201cStrongly disagree\u201d to \u201cStrongly agree.\u201d In Figure 1, we observe that most of the respondents did not agree with the expected impact of bot adoption on pull requests, considering the five studied activities indicators: number of pull requests received, merged, and non-merged; number of comments; and the time to close pull requests.\n\nMost of the respondents claimed that there is no relation between the number of pull requests and the presence of the bot; they stated that the amount of opened pull requests \u201cdepends on bugs or features for the software.\u201d However, one respondent claimed that it could lead to an increase in the number of pull requests, and \u201ca better experience for everyone involved (which might eventually lead to repeat contributors).\u201d Regarding merged and non-merged pull requests, maintainers claimed that these trends are typically \u201chuman factors\u201d unrelated to bot adoption. One maintainer believed that the ability to filter out contributions that reduce code quality also reduces the merge rates of pull requests.\nRespondents (36%) perceived an increase in the number of comments made to pull requests after bot adoption. One respondent claimed that this increase occurs because contributions that drastically reduce the coverage stimulate the exchange of comments between maintainers and contributors. Another maintainer explained that the number of comments increased because maintainers and \u201ccontributors started discussing how to best test something.\u201d\n\nMaintainers believe (41% of them) that the code review bot helped decrease the time-to-close pull requests. One respondent did not agree with the statement, and left a comment telling us that the code review bot actually increased the time to merge pull requests, due to the need for additional time to write tests and obtain a stable code. Another maintainer commented that the bot increases the time to merge the contributions, though to them \u201cit is not perceived as a bad thing.\u201d\n\nWe also openly asked maintainers about the changes introduced by the adoption of code review bots on the maintenance process and in the project itself. Twenty-three participants (18.1%) did not report any change. The other responses were grouped into 13 categories, as can be seen in Table 2.\n\nThe most recurrent reported change is that the adoption of code review bots requires less manual labor from maintainers (33 mentions). In general, respondents mentioned that the maintenance process is easier when they have fewer manual tasks to perform, because they \u201cneed to spend less time on it.\u201d The maintainers also suggest that bots could help reduce the number of human resources necessary to complete a task, which makes \u201cit easier by reducing the number of review comments, general feedback and manual quality assurance required for a successful merge.\u201d Nevertheless, maintainers are also aware of the implications that \u201cautomation like this is always prone to non-fatal error.\u201d\n\nSeveral maintainers (20) noticed changes in the quality of the contributions received, reporting that the bot helps to enforce high-quality code. In one example, a respondent mentioned that \u201cthe introduction of bots increased the quality of the code seen by maintainers in the initial review since contributors got timely (a few minutes) feedback about parts that failed basic quality standards such as missing tests, missing documentation, incorrect style, or broken functionality.\u201d Another 6 respondents also realized positive effects on the quality of the code review process, which \u201ctranslate in a more efficient code review and more robust codebase in the long term.\u201d\n\nSince one of the most common reasons to adopt a code review bot is to enforce code coverage, unsurprisingly, 16 respondents mentioned the increase in the code coverage after adoption. Most of the respondents reported that these bots help to \u201cencourage to add more tests\u201d when \u201cthe coverage is not good enough.\u201d One respondent stated the importance of the awareness of code coverage: \u201cthe effects are visible to the contributors, and they will generally resolve any decreased coverage in the pull request.\u201d Additionally, one respondent claimed that the bot feedback also \u201cspurred further pull requests to increase coverage.\u201d\n\nAnother bot adoption effect is that reviewing pull requests became faster, which was reported by 16 maintainers. Three respondents mentioned that faster reviews lead to faster merging. A respondent stated that high-quality pull requests were more quickly identified since \u201cthe human review step was always started with a\n\n\nbaseline level of quality\u201d and thus merged faster. In addition, another maintainer reinforced the efficiency of this process: \u201csome of the bots do it so well, that we can merge pull requests immediately after opening it.\u201d In addition, 7 maintainers also reported that the quality of the code review process improved.\n\nOther categories, although less recurrent, called our attention to the negative effects reportedly caused by bot adoption. One respondent said that bots intimidate newcomers, since some newcomers close their pull requests after a bot comment. Another believes that, for a newcomer, receiving an assessment \u201cyou let coverage go down,\u201d instead of a \u201cthanks for your contribution,\u201d \u201ccan be a little daunting.\u201d Respondents also mentioned that after adoption testing started to require more time than development and the bot\u2019s comments introduced noise. Another respondent said that a bot can impersonate human developers due to bots\u2019 strict rules, which stressed out contributors.\n\nAnswer to RQ2. Among the positive changes incurred from code review bots, maintainers reported that less manual labor was required after bot adoption (25.9%) and bots enforced high-quality code (15.7%). The negative effects include communication noise, more time spent with tests, newcomers\u2019 dropout, and bots impersonating maintainers.\n\n5 DISCUSSION AND IMPLICATIONS\n\nAdding a code review bot to a project can represent the desire to better communicate with developers, helping contributors and maintainers be more effective, and achieving improved interpersonal communication, as already discussed by Storey and Zagalsky [25]. In fact, our results reveal that the predominant reason for using a code review bot is to improve the feedback communicated to developers. Moreover, maintainers are also interested in automating code review tasks to reduce the maintenance burden and enforce high code coverage.\n\nMost of the maintainers\u2019 perceptions of how bots impact on maintenance are in line with the reported motivations. Indeed, maintainers started to spend less effort on trivial tasks, allowing them to focus on more important aspects of code review. Furthermore, code review bots guide contributors toward detecting change effects before maintainers triage the pull requests [29], ensuring high-quality standards and a faster code review. Bots\u2019 feedback provides an immediate and clear sense of what contributors need to do to have their contribution reviewed. Maintainers also noted that contributors\u2019 confidence increased when a code review bot provided situational awareness [25], indicating standards, language issues, and coverage to contributors.\n\nOn the one hand, adopting a bot save maintainers\u2019 costs, time, and effort during the code review activities. On the other hand, our study also reports four unexpected and negative effects of adopting a bot to assist the code review process. Such effects include communication noise, more time spent with tests, newcomers\u2019 dropout, and bots impersonating maintainers. Although less recurrent, these effects are non-negligible to the OSS community.\n\nPrevious work by Wessel et al. [29] has already mentioned the support for newcomer onboarding both in terms of challenges and as a feature maintainers desire. In our survey, maintainers claim it is easier for newcomers to submit a high-quality pull request with only the intervention of bots. However, another maintainer pointed out that when newcomers and casual contributors receive feedback from the bot, it can lead to rework, discussions, and ultimately dropping out from contributing.\n\nOur study suggests practical implications for practitioners as well as insights and suggestions for researchers.\n\nAwareness of bot effects. Indeed, the maintenance activities changed following the adoption of code review bots. This change can directly affect contributors\u2019 and maintainers\u2019 work. Hence, understanding how the code review bot adoption affects a project is important for practitioners, mainly to avoid unexpected or even undesired effects. Awareness of unexpected bot effects can lead maintainers to take countermeasures and/or decide whether or not to use a code review bot.\n\nImproving bots\u2019 design. Anyone who wants to develop a bot to support the code review process needs to consider the impact the bot may have on both technical and social contexts. Based on our results, further bot improvements can be envisioned. For example, in order to prevent bots from introducing communication noise, bot developers should know when and to what extent the bot should interrupt a human [14, 24].\n\nImproving newcomers support. As aforementioned, previous literature on bots already mentioned a lack of support for newcomers [29]. It is reasonable to expect that newcomers who receive friendly feedback will have a higher engagement level and thus sustain their participation on the project. Hence, future research can help bot designers by providing guidelines and insights to support new contributors.\n\n6 THREATS TO VALIDITY\n\nSince we leverage qualitative research methods to categorize the open-ended questions asked in our survey, we may have introduced categorization bias. To mitigate this bias, we conducted this process in pairs and carefully discussed categorization among the authors. Regarding our survey, the order that we presented the questions to the respondents may have influenced the way they answered them. In addition, we cannot guarantee that maintainers correctly understood sentences 4 and 5. We tried to order the questions based on the natural sequence of actions to help respondents understand the questions\u2019 context.\n\n7 FINAL CONSIDERATIONS\n\nIn this work, we conducted a preliminary investigation into maintainers\u2019 perceptions of the effects of adopting bots to support the code review process on pull requests. The most frequently mentioned motivations for using bots including automating repetitive tasks, improving tools\u2019 feedback to developers and reducing maintenance effort (RQ1). Moreover, maintainers cite several benefits of bots, such as decreasing the time to close pull requests and reducing the workload with laborious and repetitive tasks. However, maintainers also stated negative effects, including the introduction of noise and (RQ2). Based on these preliminary findings, future research can focus on better supporting and understanding bots\u2019 influences on social interactions in the context of OSS projects.\nMoreover, future work can investigate the effects of adopting a bot and the expansion of our analysis for other types of bots, activity indicators, and social coding platforms.\n\nACKNOWLEDGMENTS\n\nWe thank all the participants of this study, who volunteered to support our research. This work was partially supported by the Coordena\u00e7\u00e3o de Aperfei\u00e7oamento de Pessoal de N\u00edvel Superior \u2013 Brasil (CAPES) \u2013 Finance Code 001, CNPq (grant 141222/2018-2), and National Science Foundation (grants 1815503 and 1900903).\n\nREFERENCES\n\n[1] Ahmad Abdellatif and Emad Shihab. 2020. MSBot: Using Bots to Answer Questions from Software Repositories. Empirical Software Engineering (EMSE) 25 (2020), 1834\u20131863. https://doi.org/10.1007/s10664-019-09788-5\n\n[2] Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and challenges of modern code review. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 712\u2013721.\n\n[3] Olga Baysal, Oleksii Kononenko, Reid Holmes, and Michael W Godfrey. 2016. Investigating technical and non-technical factors influencing modern code review. Empirical Software Engineering 21, 3 (2016), 932\u2013959.\n\n[4] Chris Brown and Chris Parpin. 2019. Sorry to Bother You: Designing Bots for Effective Recommendations. In Proceedings of the 1st International Workshop on Bots in Software Engineering (Montreal, Quebec, Canada) (BotSE \u201919). IEEE Press, Piscataway, NJ, USA, 54\u201358. https://doi.org/10.1109/BotSE.2019.00021\n\n[5] A. Carvalho, W. Luz, D. Marciolo, R. Bonf\u00e1ciu, G. Pinto, and E. Dias Canedo. 2020. C-3PR: A Bot for Fixing Static Analysis Violations via Pull Requests. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE Computer Society.\n\n[6] Nathan Cassee, Bogdan Vasilescu, and Alexander Serebrenik. 2020. The silent helper: the impact of continuous integration on code reviews. In 27th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 49\u201360.\n\n[7] Linda Erenholt, Francisco Gomez de Oliveira Neto, Riccardo Scandariato, and Philipp Leitner. 2019. Current and Future Bots in Software Development. In Proceedings of the 1st International Workshop on Bots in Software Engineering (Montreal, Quebec, Canada) (BotSE \u201919). IEEE Press, Piscataway, NJ, USA, 7\u201311. https://doi.org/10.1109/BotSE.2019.00009\n\n[8] Georgios Gousios and Diomidis Spinellis. 2012. GHtorrent: GitHub\u2019s data from a firehose. In 2012 9th IEEE Working Conference on Mining Software Repositories (MSR). IEEE, 12\u201321.\n\n[9] David Kavaler, Asher Trockman, Bogdan Vasilescu, and Vladimir Filkov. 2019. Tool choice matters: JavaScript quality assurance tools and usage outcomes in GitHub projects. In Proceedings of the 41st International Conference on Software Engineering. IEEE Press, 476\u2013487.\n\n[10] Carlene Lebeuf, Alexey Zagalsky, Matthieu Foucault, and Margaret-Anne Storey. 2019. Defining and Classifying Software Bots: A Faceted Taxonomy. In Proceedings of the 1st International Workshop on Bots in Software Engineering (Montreal, Quebec, Canada) (BotSE \u201919). IEEE Press, Piscataway, NJ, USA, 1\u20136. https://doi.org/10.1109/BotSE.2019.00008\n\n[11] Bin Lin, Alexey Zagalsky, Margaret-Anne Storey, and Alexander Serebrenik. 2016. Why developers are slacking off: Understanding how software teams use slack. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion. ACM, 333\u2013336.\n\n[12] Dongyu Lau, Mich J. Smith, and Kalyan Veeramachaneni. 2020. Understanding User-Bot Interactions for Small-Scale Automation in Open-Source Development. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI EA \u201920). Association for Computing Machinery, New York, NY, USA, 1\u20138. https://doi.org/10.1145/3334480.3382998\n\n[13] Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E Hassan. 2014. The impact of code review coverage and code review participation on software quality: A case study of the qt, vtk, and itk projects. In Proceedings of the 11th Working Conference on Mining Software Repositories. 192\u2013201.\n\n[14] Samim Mirhosseini and Chris Parpin. 2017. Can Automated Pull Requests Encourage Software Developers to Upgrade Out-of-date Dependencies?. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (Urbana-Champaign, IL, USA) (ASE \u201917). IEEE Press, Piscataway, NJ, USA, 94\u201394. http://dl.acm.org/citation.cfm?id=3155562.3155577\n\n[15] Martin Monperrus. 2019. Explainable Software Bot Contributions: Case Study of Automated Bug Fixes. In Proceedings of the 1st International Workshop on Bots in Software Engineering (Montreal, Quebec, Canada) (BotSE \u201919). IEEE Press, Piscataway, NJ, USA, 12\u201315. https://doi.org/10.1109/BotSE.2019.00010\n\n[16] KF Mulder. 2013. Impact of new technologies: how to assess the intended and unintended effects of new technologies. Handb. Sustain. Eng. (2013).\n\n[17] Elahe Paikari and Andr\u00e9 van der Hoek. 2018. A Framework for Understanding Chatbots and Their Future. In Proceedings of the 11th International Workshop on Cooperative and Human Aspects of Software Engineering (Gothenburg, Sweden) (CHASE \u201918). ACM, New York, NY, USA, 13\u201316. https://doi.org/10.1145/3195836.3195859\n\n[18] Luyao Ren, Shurui Zhou, Christian Kastner, and Andrzei Woznarski. 2019. Identifying Redundancies in Fork-based Development. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 230\u2013241.\n\n[19] Edward Smith, Robert Loftin, Emerson Murphy-Hill, Christian Bird, and Thomas Zimmermann. 2013. Improving developer participation rates in surveys. In 2013 6th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE, 89\u201392.\n\n[20] Igor Steinmacher, Gustavo Pinto, Igor Scaliante Wiese, and Marco A. Gerosa. 2018. Almost There: A Study on Quasi-contributors in Open Source Software Projects. In Proceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden) (ICSE-SEIP \u201918). ACM, New York, NY, USA, 256\u2013266. https://doi.org/10.1145/3180155.3180208\n\n[21] Igor F\u00e1bio Steinmacher. 2015. Supporting newcomers to overcome the barriers to contribute to open source software projects. Ph.D. Dissertation. Universidade de S\u00e3o Paulo.\n\n[22] Margaret-Anne Storey, Alexander Serebrenik, Carolyn Penstein Ros\u00e9, Thomas Zimmermann, and James D. Herbsleb. 2020. B0Tse: Bots in Software Engineering (Dagstuhl Seminar 19471). Dagstuhl Reports 9, 11 (2020), 84\u201396.\n\n[23] Margaret-Anne Storey and Alexey Zagalsky. 2016. Disrupting Developer Productivity One Bot at a Time. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (Seattle, WA, USA) (FSE 2016). ACM, New York, NY, USA, 928\u2013931. https://doi.org/10.1145/2950290.2938989\n\n[24] Margaret-Anne Storey, Alexey Zagalsky, Fernando Figueira Filho, Leif Singer, and Daniel M. German. 2017. How Social and Communication Channels Shape and Challenge a Participatory Culture in Software Development. IEEE Trans. Softw. Eng. 43, 2 (Feb. 2017), 185\u2013204. https://doi.org/10.1109/TSE.2016.2584053\n\n[25] Simon Urli, Zhongxing Yu, Lionel Seinturier, and Martin Monperrus. 2018. How to Design a Program Repair Bot? Insights from the Repairinator Project. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice (Gothenburg, Sweden) (ICSE-SEIP \u201918). ACM, New York, NY, USA, 95\u2013104. https://doi.org/10.1145/3183519.3183540\n\n[26] Rijnard van Tonder and Claire Le Goues. 2019. Towards an Engineer/Bot: Principles for Program Repair Bots. In Proceedings of the 1st International Workshop on Bots in Software Engineering (Montreal, Quebec, Canada) (BotSE \u201919). IEEE Press, Piscataway, NJ, USA, 43\u201347. https://doi.org/10.1109/BotSE.2019.00019\n\n[27] Mairieli Wessel, Bruno Mendes de Souza, Igor Steinmacher, Igor S. Wiese, Ivanilto Polato, Ana Paula Chaves, and Marco A. Gerosa. 2018. The Power of Bots: Characterizing and Understanding Bots in OSS Projects. Proceedings of the ACM Conference on Computer Supported Cooperative Work Social Computing 2, CSCW, Article 182 (Nov. 2018), 182:1\u2013182:18. https://doi.org/10.1145/3274451\n\n[28] Mairieli Wessel, Alexander Serebrenik, Igor Scaliante Wiese, Igor Steinmacher, and Marco Aurelio Gerosa. 2020. Effects of Adopting Code Review Bots on Pull Requests to OSS Projects. In IEEE International Conference on Software Maintenance and Evolution. IEEE Computer Society.\n\n[29] Mairieli Wessel and Igor Steinmacher. 2020. The Inconvenient Side of Software Bots on Pull Requests. In Proceedings of the 2nd International Workshop on Bots in Software Engineering (BotSE). https://doi.org/10.1145/3387940.3391504\n\n[30] Marvin Wyrich and Justus Bogner. 2019. Towards an Autonomous Bot for Automatic Source Code Refactoring. In Proceedings of the 1st International Workshop on Bots in Software Engineering (Montreal, Quebec, Canada) (BotSE \u201919). IEEE Press, Piscataway, NJ, USA, 24\u201328. https://doi.org/10.1109/BotSE.2019.00015\n\n[31] Yue Yu, Huaimin Wang, Vladimir Filkov, Premkumar Devanbu, and Bogdan Vasilescu. 2015. Wait for It: Determinants of Pull Request Evaluation Latency on GitHub. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. 367\u2013371. https://doi.org/10.1109/MSR.2015.42\n\n[32] Yangyang Zhao, Alexander Serebrenik, Yuming Zhou, Vladimir Filkov, and Bogdan Vasilescu. 2017. The impact of continuous integration on other software development practices: a large-scale empirical study. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 60\u201371.\n\n[33] Thomas Zimmermann. 2016. Card-sorting: From text to themes. In Perspectives on Data Science for Software Engineering. Elsevier, 137\u2013141.", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/017-wessel.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 6, "total-input-tokens": 20721, "total-output-tokens": 9133, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 4800, 1], [4800, 11479, 2], [11479, 17791, 3], [17791, 21431, 4], [21431, 27885, 5], [27885, 37699, 6]]}}
|
|
{"id": "c7dea1a78813f0c8658c911e1216f01fe17a4b05", "text": "Open Source Software Sustainability: Combining Institutional Analysis and Socio-Technical Networks\n\nLIKANG YIN, University of California, Davis, USA\nMAHASWETA CHAKRABORTI, University of California, Davis, USA\nYIBO YAN, University of California, Davis, USA\nCHARLES SCHWEIK, University of Massachusetts Amherst, USA\nSETH FREY, University of California, Davis, USA\nVLADIMIR FILKOV, University of California, Davis, USA\n\nCCS Concepts: \u2022 Human-centered computing \u2192 Empirical studies in collaborative and social computing.\n\nAdditional Key Words and Phrases: Institutional Design; Socio-technical Systems; OSS Sustainability\n\nACM Reference Format:\nLikang Yin, Mahasweta Chakraborti, Yibo Yan, Charles Schweik, Seth Frey, and Vladimir Filkov. 2022. Open Source Software Sustainability: Combining Institutional Analysis and Socio-Technical Networks. Proc. ACM Hum.-Comput. Interact. 6, CSCW2, Article 404 (November 2022), 23 pages. https://doi.org/10.1145/3555129\n\nABSTRACT\nSustainable Open Source Software (OSS) forms much of the fabric of our digital society, especially successful and sustainable ones. But many OSS projects do not become sustainable, resulting in abandonment and even risks for the world\u2019s digital infrastructure. Prior work has looked at the reasons for this mainly from two very different perspectives. In software engineering, the focus has been on understanding success and sustainability from the socio-technical perspective: the OSS programmers\u2019 day-to-day activities and the artifacts they create. In institutional analysis, on the other hand, emphasis has been on institutional designs (e.g., policies, rules, and norms) that structure project governance. Even though each is necessary for a comprehensive understanding of OSS projects, the connection and interaction between the two approaches have been barely explored.\n\nIn this paper, we make the first effort toward understanding OSS project sustainability using a dual-view analysis, by combining institutional analysis with socio-technical systems analysis. In particular, we (i) use linguistic approaches to extract institutional rules and norms from OSS contributors\u2019 communications to represent the evolution of their governance systems, and (ii) construct socio-technical networks based on longitudinal collaboration records to represent each\n\nAuthors\u2019 addresses: Likang Yin, lkyin@ucdavis.edu, University of California, Davis, CA, USA; Mahasweta Chakraborti, mchakraborti@ucdavis.edu, University of California, Davis, CA, USA; Yibo Yan, ybyan@ucdavis.edu, University of California, Davis, CA, USA; Charles Schweik, cschweik@umass.edu, University of Massachusetts Amherst, MA, USA; Seth Frey, sethfrey@ucdavis.edu, University of California, Davis, CA, USA; Vladimir Filkov, vfilkov@ucdavis.edu, University of California, Davis, CA, USA.\n\nPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.\n\n\u00a9 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.\n2573-0142/2022/11-ART404 $15.00\nhttps://doi.org/10.1145/3555129\n\nProc. ACM Hum.-Comput. Interact., Vol. 6, No. CSCW2, Article 404. Publication date: November 2022.\nproject\u2019s organizational structure. We combined the two methods and applied them to a dataset of developer digital traces from 253 nascent OSS projects within the Apache Software Foundation (ASF) incubator. We find that the socio-technical and institutional features relate to each other, and provide complimentary views into the progress of the ASF\u2019s OSS projects. Refining these combined analyses can help provide a more precise understanding of the synchronization between the evolution of institutional governance and organizational structure.\n\n1 INTRODUCTION\n\nOpen Source Software (OSS) is a multi-billion dollar industry. A majority of modern businesses, including all major tech companies, rely on OSS without even knowing it. OSS contributions are an important manifestation of computer-supported collaborative work, for the high degree of technical literacy typical of OSS contributors. Even though this popularity attracts many software developers to open source, more than 80% of OSS projects are abandoned [37].\n\nThe failure of collaborative work in OSS has received attention from two perspectives. In software engineering, the focus has been on understanding success and sustainability from the socio-technical perspective: the OSS developers\u2019 day-to-day activities and the artifacts they create. In the management domain, on the other hand, emphasis has been on institutional designs (e.g., policies, rules, and norms) that structure governance and OSS project administration. In particular, systems that generate public goods address these and other endemic social challenges by creating governance institutions for attracting, maintaining, incentivizing, and coordinating contributions. Ostrom [32] defines institutions as \u201c\u2026 prescriptions that humans use to organize all forms of repetitive and structured interactions\u2026\u201d. Institutions guide interactions between participants in an OSS project, and can be informal such as established norms of behavior, or more formalized as written or codified rules. These norms and formalized rules, along with the mechanisms for rule creation, maintenance, monitoring, and enforcement, are the means through which collective action in OSS development occur [37], and they can be tiered or nested, as in the context of OSS projects embedded within an overarching OSS nonprofit organization.\n\nBoth methods have separately been shown to be utilitarianly describing the state of a process, however, combining the two perspectives has been barely explored. In this paper, we undertake a convergent approach, considering from one side OSS projects\u2019 socio-technical structure and the other aspects of their institutional design. Our goal is to use these two perspectives synergistically, to identify when they strengthen and complement each other, and to also refine our understanding of OSS sustainability through the two methodological approaches. Central to our approaches is the idea that trajectories of individual OSS projects can be understood in the convergent framework through the context provided by similar projects that already are being readily sustained or have been abandoned.\n\nWe leverage a previously published dataset [47] of traces representing OSS developer\u2019s day-to-day activities as part of the Apache Software Foundation Incubator (ASFI) project. These developers are a part of projects that have decided to undergo the process of incubation, toward becoming part of the ASF, and benefiting from the services it provides to member projects. The dataset includes historical traces and a sustainability label (graduation or retirement) for each project. Graduation is an indication of successful incubation and the readiness of a nascent project to join ASF proper, otherwise the project is retired. In other words and importantly, in this paper, we use the ASFI project outcomes of graduation or retirement as a measure of sustainability of the project. We assume that graduated projects are sustained longer than retired ones, although that might not\nalways be the case\\textsuperscript{1}. But key hurdles that OSS projects have to demonstrate to graduate is that they can (1) produce new releases, and (2) show the ability to attract new developers. Both of these factors arguably are key to the sustainability of OSS projects.\n\nWe utilize this dataset to study the extent to which graduated and retired projects differ from each other, from the point of view of both the socio-technical structure and the institutional governance. On the socio-technical side, we construct the monthly longitudinal social and technical networks for each project, and calculate several measures describing the features of the networks. On the institutional governance side, we implement a classifier trained on manual annotations of institutional statements in the publicly accessible email communications among ASF participants. Then we compare the findings of our socio-technical and institutional metrics for project-level and individual-level activities. Next, we perform exploratory data analyses, deep-dive case studies, and eventually, we look at how socio-technical measures associate with the prevalence of institutional statements, and evolutionary trajectories during OSS project incubation to sustainability. In summary, we find that:\n\n- We can effectively extract governance content from email discussions in the form of institutional statements, and they fall into 12 distinguishable topics.\n- Projects with different graduation (i.e., sustainability) outcomes differ in how much governance discussion occurs within their communities, and also in their socio-technical structure.\n- Self-sustained projects (i.e., graduated) have a more socially active community, achieving it within their first 3 months of incubation, and they demonstrate more active contributions to documentation and more active communication of policy guidance via institutional statements.\n- A project\u2019s socio-technical structure is temporally associated with the institutional communications that occur, depending on the role of the agent (mentor, committer, contributor) communicating institutional statements.\n\nTo provide the most relevant context, recently, Yin et. al. [46] showed that socio-technical networks can be used to effectively predict whether a project will graduate or retire from the ASF incubator. That work did not include any institutional or governance analysis. Here, we focus on closing the gap by studying the relationship between the organizational structure (i.e., the socio-technical system) and institutional governance in peer-contributed OSS projects. Our study is the first attempt to provide a common framework for simultaneous, socio-technical structure and institutional, analysis of OSS projects, in order to describe and understand a process affected by both, that is, project gaining self-sustaining and self-governing community and eventually graduating from the ASF incubator. We are hopeful that refining this convergent approach, of structural and institutional analyses, will open new ways to consider and study emergent properties like project sustainability.\n\n2 THEORETICAL FRAMEWORK\n\nHere we introduce the theories behind the two different viewpoints, Institutional Analysis and Development (IAD) and Social-Technical Systems (STS), as well as Contingency Theory serving as the glue between institutional governance and the organizational structure of OSS projects.\n\n\\textsuperscript{1}For example, it could be that some ASFI retired projects simply could not adapt to the policies and requirements set in the ASFI program but yet continue on, \u2018in the wild\u2019 or perhaps aligned with a different OSS foundation.\n2.1 Institutional Theory and Commons Governance\n\nOSS projects are a form of digital commons, or more precisely, Commons-Based Peer Production (CBPP) [37]. Legal scholar Yochai Benkler [2] introduced the phrase CBPP to describe situations where people work collectively over the Internet, and where organizational structure is less hierarchical. While CBPP situations are found in a variety of settings (e.g., collaborative writing, open source hardware) Benkler argues that OSS is the \u2018quintessential instance\u2019 of CBPP.\n\nThere is a relatively long history of the study of governance in commons settings, arguably led by Nobel laureate Elinor Ostrom and her groundbreaking book Governing the Commons [31]. Ostrom\u2019s Institutional Analysis and Development (IAD) framework was developed to study the governance institutions that communities develop to self-manage natural resources. Much of this research focuses on the governance and sustainability of natural resource settings, e.g., water [6], marine [19], and forest [16] settings.\n\nA key challenge in natural resource commons settings is that individuals who cannot easily be excluded from extracting resources from the pool of available natural resources often have little incentive to contribute toward the production or maintenance of that resource \u2013 what are commonly referred to as \u2018free-riders\u2019 [29]. In forest, fishery, and water settings, the free-rider problem in open access settings can lead to a problem termed by Hardin as the \u2018Tragedy of the Commons\u2019 [20]. Ostrom famously pushed back against Hardin\u2019s analysis and over a course of a lifetime of work, highlighted that communities can avoid tragedy through hard work in developing self-governing institutions.\n\nOSS commons are fundamentally different from natural resources in that digital resources can be readily replicated and are not subject to degradation due to over-harvesting. Therefore, if over-appropriation is not a problem, is there a potential tragedy of the commons in an OSS context? Invariably the answer is yes, and it lies at the heart of the idea of OSS sustainability. The tragedy occurs when there are free-riders and insufficient human resources available to continue to further develop and maintain the software and, as a result, the software project fails to achieve the functionality and use that was perhaps envisioned when it began, and becomes abandoned [36]. Ostrom and Hess [22] aptly describe this tragedy as \u2018collective inaction.\u2019\n\nOstrom\u2019s Nobel Prize-winning body of work was studying how humans collectively act and craft self-governing institutional arrangements to effectively avoid the tragedy in natural resource settings. Central in this effort was the introduction and evolution of the Institutional Analysis and Development (IAD) framework [32]. Later, IAD was applied to the study of digital or knowledge commons [17, 22] and explicitly to the study of self-governance in OSS, where Schweik and English undertook the first study of technical, community, and institutional designs of a large number of OSS projects [37].\n\nWith that being said, prior work has found that self-governing OSS projects develop highly organized social and technical structures [5]. Those having foundation support, like the ASF, may additionally be in the process of organizing the developers\u2019 structured interactions under a second tier of governance prescriptions as required by the ASF Incubator. We refer to an individual institutional prescription as an Institutional Statement (IS), which can include rules and norms, and which we define as a shared linguistic constraint or opportunity that prescribes, permits, or advises actions or outcomes for actors (both individual and corporate) [10, 39]. Institutions, understood operationally as collections of institutional statements, create situations for structured interaction for collective action. In other words, configurations of ISs affect the way collective action is organized. In the context of ASF and OSS projects, incubator ISs can affect OSS project social and technical structure.\nWith IS and other approaches to institutional analysis, it becomes possible to articulate the relationships between governance, organizational, and technical variables. For example, previous studies on OSS often report code modularity as a key technical design attribute [28, 30]. Hissam et al. [23] write: \u2018A well-modularized system \u2026 allows contributors to carve off chunks on which they can work.\u2019 Open and transparent verbal discussion between OSS team members and other ASF officials (e.g., mentors) about OSS project or ASF institutional design, captured in the form of institutional statements, could then predict effort by project contributors to restructure their project\u2019s technical infrastructure to be more modular and inviting to new contributors. Using the approaches of institutional analysis, we extract institutional content from open access email exchanges between OSS project contributors to understand the role of communication governance information in OSS project sustainability.\n\n2.2 Socio-Technical System Theory\n\nA Socio-Technical System (STS) comprises two entities [42]: the social system where members continuously create and share knowledge via various types of individual interactions, and the technical system where the members utilize the technical hardware to accomplish certain collective tasks. STS theory can be considered to combine the views from both engineers and social scientists, an intermediary entity of sorts, that transfers the institutional influence to individuals [35]. The theory of STS is often referenced when studying how a technical system is able to provide efficient and reliable individual interactions [21], and how the social subsystem becomes contingent in the interactions and further affects the performance of the technical subsystem [15]. Moreover, the socio-technical system theory plays an important role in analyzing collective behavior in OSS projects [3]. OSS projects have also been studied from a network point of view [12, 24]. Gonz\u00e1lez-Barahona et al. [18] proposed using technical networks, where nodes are the modules in the CVS repository and edges indicate two modules share common committers, to study the organization of ASF projects. In socio-technical systems, organizations can intervene through long-term or short-term means. Smith et al. [40] propose two conceptual approaches, \u2018outside\u2019 and \u2018inside\u2019: \u2018outside\u2019 approaches represent the socio-technical and are managerial in approach. \u2018Inside\u2019 approaches are more reflexive about the role of management in co-constituting the socio-technical.\n\nFrom that perspective, the Apache Software Foundation (ASF) community is a unique system that has both outside influence regulations from ASF board and members and inside governance managed or self-governed by individual Project Management Committees (PMC).\n\n2.3 Contingency Theory, or There Are No Panaceas in Self-Governance\n\nContingency theory is the notion that there is no one best way to govern an organization. Instead, each decision in an organization must depend on its internal structure, contingent upon the external context (e.g., stakeholder [43], risk [9], schedule [45], etc.). Joslin et al. [25] find that project success is associated with the methodologies (e.g., processes, tools, methods, etc.) adopted by the project. Here, in particular, we treat the institutional statements as an abstraction of the methodologies in OSS development. As the organizational context changes over time, to maintain consistency, the project must adapt to its context accordingly. Otherwise, conflicts and inefficiency occur [1], i.e., not a single organizational structure is equally effective in all cases. Similar arguments have been made in the field of institutional analysis, arguing that there are no panaceas or standard blueprints for guiding the institutional design of a collective action problem [33].\n\nTo address the conflicts caused by incompatibilities with the project\u2019s context, previous work suggests thinking holistically. Lehtonen et al. [26] consider the project environment as all measurable spatio-temporal factors when a project is initiated, processed, adjusted, and finally terminated. They suggest that the same factor can have an opposite influence on the projects under a different\ncontext. Joslin et al. [25] consider project governance to be part of the project context, concluding that project governance can impact the use and effectiveness of project methodologies.\n\nAs per contingency theory, during ASFI projects\u2019 incubation, developers and mentors have to make in-time decisions on their organizational structure, contingent on what is happening in the institutional rules and governance, and vice versa.\n\n3 RESEARCH QUESTIONS\n\nReflecting on the previous discussion, the primary goal of this paper is to demonstrate that the evolution of a project from a nascent state to a sustainable state can be studied effectively by combining the two different methodologies of socio-technical network analysis and institutional analysis.\n\nWe reported in prior sections that a variety of scholars have utilized a socio-technical systems approach to analyze collective behavior in OSS projects. We also described how institutional analysis is useful in understanding collective action in OSS settings. To enable to dual-view on sustainability, we first describe and evaluate our automated approach to identifying institutional statements in project emails.\n\nRQ1: Are there institutional statements contained in ASF Incubator project email discussions? Can we effectively identify them?\n\nWith the next two research questions, we assess the utility of our convergent approach to the Institutional Analysis (IAD) and STS frameworks. In the case of the ASF incubation program, there are two eventual outcomes: either a project graduates from the ASF incubator and becomes a full-fledged ASF-associated project, or it retires without achieving that goal. In this context, we operationalize a sustainable state as one where an OSS project graduates from the ASF incubator program, rather than retires. We ask:\n\nRQ2: Is OSS project evolution toward sustainability readily observable through the dual lenses of institutional and socio-technical analysis? And how do such temporal patterns differ?\n\nPer institutional analysis theory, strategies, norms, and rules can affect the social and technical organizations of projects. Governance and organization, per social theories, must work hand-in-hand to make viable socio-technical systems. Ill-designed institutional arrangements would introduce inefficiencies into the system, and such inefficiencies may amplify deviant behaviors and irregular structures in the system. Such influential links from institutional design to the organizational structure can be, in fact, bi-directional. In effect, in a sustainable system, an ill-formed organizational structure may instigate new rules to adjust and improve such structure, further improving efficiencies in the systems.\n\nThus, we hypothesize that the feedback, if any, between project governance and project organization should be observable, specifically in that intensified governance discussion should precede and/or follow changes to the project organizational structure. As a reminder, we consider institutional statements as indicators of intensified discussions of OSS project self-governance or new incubator requirements on that self-governance. We also consider socio-technical network parameters as indicators of organizational structure. Thus, we ask:\n\nRQ3: Are periods of increased Institutional Statements frequency followed by changes in the project organizational structure, and vice-versa?\n\nIn the following section, we introduce the methodologies approaching the above three research questions.\n4 DATA AND METHODS\n\nTo study the difference between projects that graduate ASFI (i.e., become sustainable) and those that do not, in this paper we use a collection of large-scale data sets comprising Institutional Statements and Socio-Technical variables extracted from all graduated and retired projects from the Apache Software Foundation Incubator, ASFI. In ASFI, graduation is an indication that a nascent project is sufficiently sustainable to join ASF proper\\(^2\\), otherwise the project is retired. Our combing through the Apache lists, inspecting the data, and speaking to project and community members have shown that almost all failures to graduate are sustainability failures. On rare occasions, some projects have retired for reasons other than sustainability, e.g., some are not a good fit for the Apache model\\(^3\\), despite evidence that projects are generally sufficiently aware of the ASF model before entering incubation according to their project proposal\\(^4\\).\n\nFor the socio-technical networks, we collected historical trace data of commits, emails, and incubation outcomes for 253 ASFI projects, which have available archives of both commits and emails from 03/29/2003 to 02/01/2021\\(^5\\). Among those, 204 projects have already graduated, and 49 have retired. ASF incubator projects that are still in incubation are not studied in this paper.\n\nWe collected the ASF incubator project data from the ASF mailing list archives\\(^6\\), which are open access and can be retrieved through the archive web page lists, http://mail-archives.apache.org/mod_mbox/. They contain all emails and commits from the project\u2019s ASF incubator entry date, and are current. The project URLs follow the pattern: proj_name - list_name/(YYYYMM).mbox. For example, the full URL for the dev mailing list of the Apache Accumulo project, in Dec 2014, is http://mail-archives.apache.org/mod_mbox/accumulo-dev/201412.mbox. Each such .mbox file contains a month of mailing list messages from the project, for the date specified in the URL. Here dev stands for \u2018emails among developers\u2019. Notably, there are some sites that are not following the pattern, e.g., \u2018ASF-wide lists\u2019 are not project-owned mailing lists, and the list \u2018incubator.apache.org\u2019 contains data of more than one project.\n\nTo extract Institutional Statements, we combined our email data set with a prior data set on ASF policy documents. In a given organization, institutional statements are characterized by a finite set of semantic roles (e.g. ASF Board, Mentors, contributors, etc. in ASF), and their interactions (e.g. management committees requesting reports from projects, developers voting to induct committers in ASF), in specific contexts. To account for their representation in our training corpus, we included institutional statements from not only ASF project-level email exchanges among participants, but also ASF policy documents. The supplementary set of Institutional Statements included 328 policies, which were compiled from ASF policy documents (e.g., Apache Cookbook, PPMC Guide, Incubator Policy, etc), in an economic analysis of the ASF Incubator\u2019s policies [38].\n\n4.1 Pre-processing\n\nWe collected all 1,330,003 emails across the ASF Incubator projects, from 03/29/2003 to 02/01/2021 (under mailing lists of \u2018commit\u2019, \u2018dev\u2019, \u2018user\u2019, etc.). We find that 128,257 (about 9.6%) emails are automatically generated and broadcast by continuous integration tools (i.e., bots). Because the amount of such emails is substantial, but they carry less meaningful social or institutional information, and list members rarely reply to them, we use regular expression rules to identify and eliminate them from the corpus, leaving us 1,201,746 emails.\n\n---\n\n\\(^2\\)ASF\u2019s guide to project graduation: https://incubator.apache.org/guides/graduation.html\n\n\\(^3\\)ASF\u2019s reason behind projects\u2019 retirement: https://incubator.apache.org/projects/#retired\n\n\\(^4\\)ASF incubator projects\u2019 proposal https://cwiki.apache.org/confluence/display/INCUBATOR/Proposals\n\n\\(^5\\)Our code and data is available at Zenodo: https://doi.org/10.5281/zenodo.5908030\n\n\\(^6\\)During the submission of this study, ASF had moved their email archives to Pony Mail system.\nAnd, for the technical contribution side, many projects, especially those over ten years old that used SVN, utilized a bot for extensive mailings, thus forming outliers in the dataset. Thus, we eliminate commit messages from automated bots (e.g., \u2018buildbot\u2019), 253,758 out of 3,654,196 (about 14.4%) commit messages, and email messages from issues/bug tracking bots (e.g., \u2018GitBox\u2019). Moreover, we find some developers contributed commits by directly changing/uploading massive non-source code files (e.g., data, configuration, and image files). Since committing non-coding files can form outliers in the data set, we choose to apply the GitHub Linguist\\(^7\\) to identify 731 collective programming language and markup file extensions, and exclude any other non-coding commits (e.g., creating/deleting folders, upload images, etc.).\n\n### 4.2 Constructing Socio-technical Networks\n\nNetwork science approaches have been prominent in studying complex systems, e.g., OSS projects [4, 41]. Since networks can contain rich information for both the elements (i.e., nodes) and their interactions (i.e., edges), in this study, we use socio-technical networks to anchor the abstraction of socio-technical systems. We define the projects\u2019 socio-technical structure using social (email-based) and technical (code-based) networks, extracted from their emails to the mailing lists and commits to source files. Similar to the approach by Bird et al. [3], we form a social network (weighted directed graph) for each project in each incubation month, from the communications between developers: a directed edge from developer \\(A\\) to \\(B\\) forms if \\(B\\) has replied to \\(A\\)\u2019s post in a thread or if \\(A\\) has emailed \\(B\\) directly. The weight of the edge represents the communication frequency between a pair of developers. The technical bipartite networks (weighted bipartite graph) are formed in a similar way. For each project in each month, we include an un-directed edge between a developer \\(A\\) and a source file \\(F\\) if developer \\(A\\) has committed to the source file \\(F\\) that month (excluding the SVN branch names). The weight of the edge represents the committing frequency between the developer and the source file. In summary, social networks are weighted directed graphs. We form edges between two developer nodes, if one developer replied to or referenced the other\u2019s email. Technical networks are undirected bipartite graphs, with developers forming one set of nodes, coding files forming the other, and a link being drawn when a developer contributed to a coding file. We use the networkx package from Python for the network-related implementation.\n\n### 4.3 Extracting Institutional Statements\n\nWe combined the email exchange data set with the ASF policy document data to fine-tune a BERT-based [8] classifier, for automatic detection of ISs (see Sect. 2.1 for the definition of IS).\n\nTo start, we hand-annotated a small subset of our data for ISs as follows. After selecting a random subset of 313 email threads from incubator project lists, two hand-coders labeled the sentences in them as \u2018IS\u2019 or \u2018Not IS\u2019, on the basis of whether they fit the definition of Institutional Statements. They resolved disagreements through discussion and recorded these conclusions, achieving a peak out-of-sample agreement between 0.75 to 0.80. A sentence was coded as an IS only if it was a complete sentence; fragments such as parenthetical mentions of rules or resources were not annotated as positive. This resulted in 6,805 labeled sentences (i.e., \u2018IS\u2019 or \u2018Not IS\u2019); 273 were labeled as IS.\n\nWe treated all 328 policies from the ASF documents as institutional statements, since policy documents provide arguably more formal institutional sample text compared to the norm in the email discussions. Thus, we had 601 Institutional Statements in total across these two coded datasets.\n\nInstitutional statements refer to prescriptions and shared constraints in the form of norms, rules, and strategies that are meant to mobilize and organize actors towards collective actions. The examples of institutional statements provided in Table 1 provide some instances of developer exchanges.\n\n---\n\n\\(^7\\)GitHub Linguist https://github.com/github/linguist\nTable 1. Selected Examples of Institutional Statements Found in ASFI Project Email Discussions.\n\n| Project | Date | Institutional Statements |\n|---------|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|\n| Airflow | 21 Dec 2016| \u2026 running in our Lab there is virtually no restriction what we could do, however I will hand select people who have access to this environment. I will also hold ultimate power to remove access from anyone \u2026 |\n| ODF | 07 Dec 2011| Please vote on releasing this package as < Package >. The vote is open for the next 72 hours and passes if a majority of at least three +1 ODF Toolkit PMC votes are cast \u2026 |\n| Airflow | 24 Feb 2017| \u2026 Next steps: 1) will start the voting process at the IPMC mailinglist. \u2026 So, we might end up with changes to stable. \u2026 2) Only after the positive voting on the IPMC and finalisation I will rebrand the RC to Release. |\n\nthat encompass norms and strategies with institutional implications. The first example from the Airflow project, dated 12/21/2016, involves a situation where certain developers find the computational infrastructure provided by ASF insufficient for testing and development requirements, and discuss setting up alternate arrangements to meet the bottleneck. Faced with resource limitations, one developer offers an externally hosted cloud environment through his private resources. The selected excerpt is a quote from the individual establishing the terms for using the alternate resources he may offer to the project members, including access permission and usage restrictions. ASF projects conduct voting from time to time to gather community consensus on matters of significance. The following example from ASFI project ODF, dated 12/07/2011 describes the stepwise process expected to be followed by members project-wide to conduct a vote that decides on the approval of the release of the current candidate under development. The final example from Airflow, 02/24/2017 also pertains to a similar process, where a developer discusses the voting process and the implications, especially in terms of subsequent steps that need to be fulfilled to ensure product release.\n\nBERT-based Sequential Classifier. In natural speech, such as emails, ISs can appear as whole sentences, parts of sentences, or span multiple sentences. They are also relatively sparse, with their institutional quality dependent on their inherent interpretation as well as context. Framing IS extraction as a sequential sentence classification task in the context of self-contained email segments, instead of labeling individual sentences helps take into account contextual cues.\n\nWe used the sequential sentence classifier developed by Cohan et al. [8], which leverages Bidirectional Encoder Representations from Transformers (BERT) sequence classifier [11] to classify sentences in documents. BERT can be employed to generate the representation for a sentence, through joint encoding over its neighboring sentences and then leveraging the corresponding sentence separator '<SEP>' token\u2019s tuned embedding for downstream applications, such as sentence labeling, extractive summarizing, etc. Thus, our classifier comprises BERT for attention-based joint encoding across sentences followed by a feedforward classifier to predict sentence labels based on these separator '<SEP>' vectors.\n\nTo test the performance of the classifier on email IS extraction, we held-out 40 email threads (12.5%, randomly split) out of our 313 hand-annotated email threads. The training was performed on the combined set of the remaining 273 coded email threads and the ASF policy documents. The coded training and, respectively, testing email data contained 231 and, respectively, 42 institutional statements. For both training and testing, email threads were processed to generate classifier inputs as follows. To include neighboring context while meeting length limits of the BERT-based text classifier, for each email document, sentences were first chunked into segments using a sliding window of up to 256 BERT sub-word (wordpiece) tokens. This resulted in segments containing 6 contiguous sentences each, on average, comprising as many full sentences as could\nbe accommodated in the specified subword limit. The rolling window had a step of 1 full sentence. We generated 3322 and 384 email segments for training and testing, respectively. For the policy documents, each policy with its sentences was treated as a segment, leading to 328 additional segments in the training data. There are several reasons to support the inclusion of ASF policies to augment positive training examples. (1) In terms of semantic information, they are about institutional themes and actions. This was expected to help the language model learn what sets apart Institutional themes from regular development activities and artifacts. (2) ASF policies are critical in common pool resource management and institutional operations as they describe roles, responsibilities and regulate actions, and are often invoked in email discussions\\(^8\\). (3) The institutional statements of the formal policies are the source texts that in-email references to IS are drawing from when they discuss ASF\u2019s rules in email. From this perspective, they are a vital source text for detecting these statements as they occur in email settings. Hence, while apparently sourced from formal bylaws beyond emails, ASF policies are indeed institutional statements relevant and recurring in developer conversations and are hence included in the training data.\n\nWe fine-tuned our classifier end-to-end against the corresponding labels for sentences in the segment. The training stage was conducted with a batch size of 16 and a learning rate of \\(2 \\cdot 10^{-5}\\), for 6 epochs. All other hyperparameters were left as defaults. To account for the class imbalance, we randomly oversampled training data segments that had at least one IS sentence to match the number of segments that had no IS sentences (1:1). In both the training and predicting phase, we did not incorporate any temporal information, other than the sequentiality captured by the segments. That is, when extracting the institutional statements, the model does not require the exact time of the discussion.\n\nDuring testing or prediction, due to variable length of context preceding or following each sentence in any particular segment, we treat a sentence in an email as a \u2018positive\u2019 classification, if it has been detected as an IS in at least one segment. The performance of the model has been reported in terms of the F1-score, precision, and recall with respect to the positive (\u2018IS\u2019) label detected for sentences in the test email set in Sect. 5.1.\n\n### 4.4 Topics Identification in Institutional Statements\n\nThe purpose of text modeling is to describe the text given a specific corpus, and provide numerically measurable relationships among texts, e.g., topics identification, measuring similarity, etc. We use a Latent Dirichlet Allocation (LDA) model to get semantically meaningful topics to better understand the extracted institutional statements. LDA is an unsupervised clustering approach [48], which when given a set of documents, iteratively discovers relevant topics present in them, based on the word distributions and relative prevalence in each document. We used LDA to identify prominent topic clusters occurring among all institutional statements extracted from our email archives through our trained classifier (see Sec 4.3). No prior training from our coded email set against pre-identified topic labels was used to train the LDA model. We use the coherence score provided by the \\texttt{gensim} package [44] to optimize the performance of the LDA model with respect to the number of topics; a higher coherence score represents a better clustering performance. We select the LDA model with the highest coherence score from which to draw the clusters. However, since the LDA model does not automatically generate a label for each cluster, we need to assign a label intuitively based on our domain knowledge of the ASF incubation process. Naming each topic cluster certainly carries some risks on interpretation, however, we believe that providing all top keywords for each cluster reduces such risk.\n\n\\(^8\\)https://lists.apache.org/thread/zykybdvnk9cwx03pnrfl2br9nkcb7q3f\nTable 2. Summary statistics for the monthly socio-technical variables and the counts of institutional statements from project mentors, committers, and contributors after removal of the top 2% of outliers. The numbers in parentheses denote the values after the removal of inactive months (i.e., absent of emails/commits). Prefix s_ denotes features in the social network while t_ represents the technical network.\n\n| Statistic | Mean | St. Dev. | 25% | 75% |\n|----------------------------|------------|-------------|------|------|\n| s_num_nodes | 13.04 (16.96) | 14.56 (15.04) | 4 (7) | 17 (22) |\n| s_graph_density | 0.30 (0.30) | 0.27 (0.22) | 0.12 (0.14) | 0.40 (0.40) |\n| s_avg_clustering_coef | 0.22 (0.29) | 0.23 (0.21) | 0 (0.11) | 0.39 (0.43) |\n| s_weighted_mean_degree | 11.83 (15.56) | 12.03 (12.81) | 4 (7.43) | 16 (19.71) |\n| t_graph_density | 0.37 (0.68) | 0.41 (0.32) | 0 (0.36) | 1 (1) |\n| t_num_dev_nodes | 1.18 (2.21) | 1.59 (1.60) | 0 (1) | 2 (3) |\n| t_num_file_nodes | 60.99 (114.83) | 153.94 (197.25) | 0 (6) | 38 (126) |\n| t_num_file_per_dev | 28.79 (53.57) | 80.46 (104.23) | 0 (4) | 20 (54.5) |\n| num_IS_mentor | 15.46 (15.99) | 24.46 (25.01) | 0 (1) | 20 (20) |\n| num_IS_committer | 9.34 (12.89) | 19.36 (22.36) | 0 (0) | 10 (16) |\n| num_IS_contributor | 13.18 (16.36) | 21.72 (24.42) | 0 (2) | 18 (21) |\n\n4.5 Variables of Interest\n\nWe draw institutional and socio-technical project features and variables on the basis of each framework\u2019s predictions for our research questions. Our socio-technical variables are pulled from a recent study on forecasting the sustainability of OSS projects [46], showing high predictive power of socio-technical variables. All metrics are aggregated over monthly intervals, for each project, from the start to the end of its incubation.\n\n**Longitudinal Socio-Technical Metrics:** For each project network, for each month, we constructed the social and technical networks, and from them calculate various organizational structure measures. In our tables and results, the prefix t_ in a variable\u2019s name indicates it is of the technical (code) network, while the prefix s_ in a variable\u2019s name indicates it is of the social (email) network. For the monthly social networks, we calculate the weighted mean degree s_weighted_mean_degree (sum of all nodes\u2019 weighted degree divided by the number of nodes), average clustering coefficient s_avg_clustering_coef (the average ratio of closed triangles over open triangles), graph density s_graph_density. In the technical bipartite networks, for each month, we calculate the number of unique developer nodes t_num_dev_nodes, the number of unique file nodes t_num_file_nodes, the number of files per developer t_num_file_per_dev, and the graph density t_graph_density.\n\n**Institutional Statements Frequency Metrics:** For each project, for each month, we added up the ISs in all emails of that month sent by each of the following three separate and identifiable groups of people: ASF mentors (num_IS_mentor), registered ASF committers (num_IS_committer), and contributors (num_IS_contributor). We summarize their statistics in Table 2. As noted earlier, there is a final group of emails not accounted here, sent by bots. Similar to calendar entries, they may be useful, but are not the object of our study here.\n\n4.6 Granger Causality\n\nTime series data allows for the identification of relationships between temporal variables that go beyond association. One approach, **Granger causality**, is a statistical test for identifying quasi-causality between pairs of temporal variables [13]. Given two such variables, $X_t$ and $Y_t$, the Granger causality test calculates the p-value of $Y_t$ being generated by a statistical model including only $Y$\u2019s\nprior values, $Y_{t-1}, Y_{t-2}$, etc., versus it being generated by a model that in addition to $Y$\u2019s prior values, also includes $X$\u2019s prior values $X_{t-1}, X_{t-2}$. Thus, Granger causality simply compares a base model involving only $Y$ to a more complex model involving $Y$ and $X$, and calculates if the latter is a better fit to the data. In the context of Granger causality, prior values are called lagged values, with $X_{t-1}$ having a lag of 1, $X_{t-2}$ having a lag of 2, etc. If the Granger causality test returns a small enough p-value (e.g., $< 0.01$), it is interpreted as the rejection of the null hypothesis, thus establishing that $X$ Granger causes $Y$.\n\nThe Granger causality test makes an assumption that the time-series on which it is applied are stationary, meaning they do not have a trend or seasonal effects. It is necessary to test for stationarity before running the Granger causality. We use the augmented Dickey-Fuller test [7], as implemented in `adf.test` from the R package `tseries` [27], to test stationarity. Both institutional and socio-technical variables were found to be stationary. We note that a distinction is typically made between scientific causality based on controlled experiments, and Granger causality, with the latter only satisfying one (precursor property) of multiple different properties of causality. Because of that, when Granger causality is used, the word \u2018causality\u2019 is always preceded by \u2018Granger\u2019. We also note that this test does not identify the sign, if any (i.e., positive or negative) of the Granger causality. It simply says if one exists. We use the `pgRangerTest` function to test Granger causality.\n\n5 RESULTS\n\nIn this section, we answer the proposed research questions by adopting a dual-view, from the institutional analysis and socio-technical network perspectives. We first establish the utility of our IS identification methodology.\n\n5.1 RQ$_1$: Are there institutional statements contained in ASF Incubator project discussions? If any, can we effectively identify the content of ISs?\n\nDetecting Institutional Statements. First, we focus on the ability of our BERT-based classifier to identify institutional statements in the emails. When tested on the 857 held out sentences from the 40 email threads in our test set, see Sect 4.3, our classifier achieved a precision score of 0.667, recall score of 0.681, and F1 score of 0.674 on classifying Institutional Statements, demonstrating it is able to extract ISs from developer email exchanges in spite of there being only 5.1% ISs.\n\nFor model validation against overfitting, we sought to perform stratified cross-validation (CV) on our training data. We note that our data was not ideal for a CV study: we had (1) limited data size (2) uneven distribution of ISs across the email threads and (3) class imbalance between IS and non-IS sentences. E.g., due to the limited data size, emails with high IS density could find their way in the train but not the test split, and dramatically increase the variance in cross-validation results. To ameliorate that, for more uniform stratification we chunked up each of the 273 threads in our training data into 442 sub-emails of 20 contiguous sentences each (the email threads had a mean length of 22 sentences). We fine-tuned our classifier end-to-end against the corresponding labels for sentences in the sub-emails. The subsequent input segment generation and training of the pipeline were otherwise kept unchanged. We obtained a mean F1 score on positive labeling of sentences with ISs of 0.603, with some high IS variability between folds still persisting.\n\nWe consider these performance results satisfactory given that we had a small and highly imbalanced data set (273 ISs out of 6,805 sentences). There are strong indications that increasing the positive examples in the training data set will further increase our classifier\u2019s performance. Of course, it is challenging to ascertain if classifier performance varies across projects due to limited\n\n---\n\n9When we fine-tuned the classifier with only the 273 training email threads (i.e., without Institutional statements from the ASF policy documents), the F1 for positive label was found to be about 20% lower.\nFig. 1. Comparing graduated (in blue) vs retired (in red) projects along the number of Institutional Statements (IS) (color online). The Mann-Whitney U test p-val is sufficiently small (in brackets), suggesting significant differences in means between groups.\n\nWe ran our classifier on the full corpus of 1,201,746 emails (after bot email removal) across all ASF incubator projects. It identified 313,140 ISs in the emails, for an average of 0.261 sentence-level ISs per email. Table 2 shows descriptive statistics for both the socio-technical variables and the number of institutional statements from project mentors, committers, and contributors, calculated in monthly intervals, per project.\n\nWe find that the classifier\u2019s errors are also informative. In one set of false positives, participants described plans for an event occurring outside of Apache and the relevant incubator project, not the kind of process or behavioral constraint typical of ISs. It was probably detected as an IS due to its semantic similarity to rules and guidelines which make up other positive examples. Conversely, the sentence \u2018Send it to <EMAIL> and see what the reaction is\u2019 was missed as an IS, despite appearing in the context of contributor agreements. This miss is likely due to the fact that many such recommendations are made in the emails that would not be considered institutional, because they indicate a particular individual as an individual, rather than in their institutional role.\n\nInstitutional Statements Over Roles and Sustainability Status. We turn to some exploratory analysis, to demonstrate the utility of our chosen features when reasoning about differences between graduated and retired projects. Comparing graduated and retired projects, we find a significant difference in the number of ISs. For example, in Figure 1(a), the number of IS sent by mentors in graduated projects is statistically higher than retired projects (the Mann-Whitney U test is used for testing the difference in means). This, along with the fact that graduated projects tend to be more active socially overall compared to retired projects (i.e., more email exchanges), suggests the mentors of retired projects are concerned about the projects\u2019 community progressing, thus, most of the email content is about rules and guidance. On the other hand, it is also plausible that mentors engage more socially and less institutionally with graduated projects, which may benefit those projects more. The numbers of ISs sent by committers and contributors show similar patterns. We investigate them longitudinally in the next section.\n\nTopics Identification in Institutional Statements. We use the Latent Dirichlet Allocation (LDA) model to study the token-level topics in institutional statements. By optimizing the LDA coherence score, we get the optimal number of topics of 12. The result further enables us to study which words are important to each topic. We present the clusters of top words for each topic in Table 3.\n\nAs this table reveals, words are well extracted from the institutional statements and are distinguished from each other. For example, in the first topic (i.e., \u2018Progress Report\u2019), there is a cluster of words \u2013 \u2018review\u2019, \u2018board\u2019 (which relates to ASF board), \u2018submit\u2019, and \u2018report\u2019 \u2013 all of which are\nTable 3. Topics Identified in Institutional Statements.\n\n| ID | Heuristic Topic | Top Sample Words |\n|-----|-----------------------|-------------------------------------------------------|\n| 1 | Progress Report | review, require, meeting, board, submit, report |\n| 2 | Collective Decision | vote, start, proposal, thread, close, day, bind |\n| 3 | Project Release | release, issue, think, fix, branch, policy |\n| 4 | Community | project, email, send, community, behalf, incubation, talk |\n| 5 | Report Review | board, report, time, meeting, prepare, reminder, review |\n| 6 | Mailing List Issues | list, mailing, discussion, question, issue, comment, request |\n| 7 | Documentation | update, wiki, page, website, documentation, link, doc |\n| 8 | Software Testing | release, source, build, test, note, artifact, check |\n| 9 | licensing Policy | license, file, software, version, copyright, compliance |\n| 10 | Routine Work | project, committer, help, work, way, code |\n| 11 | Mentorship | podling, report, form, mentor, know, sign, month, wish |\n| 12 | Software Distribution | work, repository, information, file, distribute, commit |\n\nassociated with the important incubator rule that requires projects to report regular progress reports. While in topic 7, words like \u2018update\u2019, \u2018wiki\u2019, \u2018page\u2019, \u2018website\u2019, and \u2018documentation\u2019 emerge, all related to requirements projects need to address related to their website or documentation requirements. The results advance the institutional theory under the software engineering domain, arguably that the IS is associated with OSS sustainability, suggest diving deeper into the connections between the social-technical system and institutional analysis.\n\n**RQ1 Summary:** We demonstrated that institutional analysis methodologies can capture differences between graduated projects and retired projects. We also showed that we can effectively identify meaningful institutional statements, and common topics, from ASF incubator projects\u2019 emails.\n\n5.2 **RQ2:** Is OSS project evolution toward sustainability observable through the dual lenses of institutional and socio-technical analysis? And how do such temporal patterns differ?\n\nIn this section, our goal is to contrast graduated and retired projects over time in both IS space and socio-technical space. Projects exit the ASF incubator at different times. In effect, there will be a larger variance during the end of the incubation month. Therefore, we restrict ourselves to the first 24 months for all projects (more than 60% projects stayed within 24 months in the incubator).\n\n**Topic Evolution Over Time.** After identifying the words that contribute to various identified topics, by aggregating over all projects, we get the volume, which is measured by the number of tokens contributing to that topic, of each topic in each month. Moreover, since there exist trends in the number of IS, we subtract the mean volume for each month, separately for the graduated and retired projects. We present them in Figure 2, where the x-axis is the number of months after their incubation start, and the y-axis indicates the relative volume compared to the mean.\n\nThe results of Mann-Whitney U test show 10 out of 12 topics are significantly different in their means between graduated and retired projects (p-val < 0.01). Not significant were topic 9 (licensing policy) and topic 12 (software distribution). Additionally, the augmented Dickey-Fuller test suggests that over time, 9 out of 12 topics are not stationary (i.e., temporal trends exist, with p-val...\nFig. 2. Topics Evolution for graduated projects (in blue) compared to retired projects (in red). The x-axis indicates the i-th month from their incubation start and the y-axis represents the relative volume of the topics. Mann-Whitney U test found 10 out of 12 topics are significantly different in their means between graduated and retired projects (p-val < 0.01). Not significant were topic 9 (licensing policy) and topic 12 (software distribution).\n\n< .01), except for topic 2 (collective decision), topic 6 (mailing lists), and topic 12 (software distribution). The testing results prompt us to analyze the difference in project-level dynamics between graduated and retired projects.\n\nWe observe an increasing trend of Topic 1 \u2018Progress Report\u2019 with a small seasonal effect, suggesting the projects are learning the \u2018Apache Way\u2019 and more actively discussing their regular project reporting over time. And such seasonal effect is found to be more significant in Topic 5 (\u2018Report Review\u2019). Project releases, documentation, and software testing, are all connected to the number of people participating regularly. Retired projects are on average smaller than the graduated ones, which is the likely explanation for the differences. E.g., in Figure 3(f), we show that graduated projects, on average, have more source files than retired projects. Moreover, we find that Topic 9, \u2018license policy\u2019, has an increasing trend in the earlier stages of incubation (e.g., months 1-7) which makes sense in that the shift from one OSS license to the license required by ASF is an important discussion that projects would want to address earlier on.\n\nOn the contrary, the longitudinal pattern of IS language related to software testing is relatively rare at the beginning of project incubation. It suggests that in earlier stages of incubation, developers are more likely focused on the transition to the incubator and perhaps less on new code development and testing. On the other hand, such transitions were implemented in a fast manner, with testing discussions increasing rapidly in incubation months 3, 4, and 5.\n\nBy comparing graduated and retired projects, we find that, Topic 10, \u2018Routine work\u2019, to be the dominant topic for both types of projects, almost through all projects\u2019 incubation (i.e., remain high volume compared to other topics). We also find that graduated projects tend to be more active on Topic 7 \u2018Documentation\u2019 and Topic 3 \u2018Project Release\u2019. Interestingly, on the other hand, mentorship-related ISs (Topic 11) are found to be more active in retired projects rather than in graduated projects. One possible reason is that retired projects did seek help from their mentors when their projects were experiencing downturns, and further issuing institution-wise statements.\nFig. 3. The averaged monthly IS and ST variables between graduated projects and retired projects. On the top are the IS measures; On the bottom are ST measures. Shades indicate one st. error away from the mean. Month index 0 indicates the incubation starting month (color online).\n\n**Metric Evolution.** We continue by exploring the evolution of our metrics over time. Looking at the mentors\u2019 ISs, shown in Figure 3(a), we can see that even at the beginning of their incubation, mentors email a greater number of ISs to projects that eventually graduate compared to ones that eventually retire.\n\nNext, we see that the number of ISs in mentor emails decline for both graduated projects and retired projects before month 5, suggesting that ASFI mentor activity may decrease after incubating projects work through the first steps of the incubation process.\n\nThen, we visually identify an increasing trend of IS from mentors around month 6 for graduated while 5 for retired projects. One possible reason is the fact that mentors start helping projects when they are experiencing difficulties or downturns. It is consistent with ASF mentorship that during the early stage of the incubation, developers are required to make institutional-related decisions, e.g., voting for reports, discussing the ASF required licensing, and the community-related issues, and it is in these kinds of areas where mentors come to help.\n\nOn the Socio-Technical networks side, shown in Figure 3(d), for the first 6 months, we can see the graduated projects have a clear increasing trend in the number of nodes in social networks, while it seems to be constant in retired projects. We can see a slight decrease around month 10 to month 12 for both types of projects, suggesting 10 months might be a good timing for mentors to intervene/motivate their projects, if they are experiencing some difficulties.\n\n**RQ2 Summary:** We identify socio-technical and institutional signatures of OSS project evolution, and evidence that it differs between graduated and retired projects, and that these patterns can even be distinguished by institutional heuristic topics. On the institutional side, both graduated and retired projects have more stable institutional topics during their first 3 months. On the Socio-Technical network side, graduated projects keep attracting community over their first 6 months, while retired projects are unstable during their first 3 months.\n5.3 Case Study: Association Between Institutional Governance and Organizational Structure\n\nTo communicate concretely how the institutional and socio-technical dimensions interact within the ASFI ecosystem, we showcase four diverse instances of their mutual interrelationship.\n\nCase A. In July 2011, the HCatalog project announced a vote for its first Release Candidate (RC), the first officially distributed version of its code. Because a project\u2019s RC\u2019s reflect on the whole ASF, they require approval from the foundation after project contributors have given their approval. In preparation for the first vote, developers double-checked the installation process and reported missing files and features. This drove contributions to the code and documentation, e.g., release notes were added after being reported missing. The contributors then cast their votes. With four people\u2019s votes, the product was approved and a proposal was forwarded to Apache Incubator leadership for approval.\n\nCase B. In December 2010, an independent developer emailed the Jena project community to share their idea for a new feature, and was asking how to proceed toward contributing it. Their query includes policy questions, such as whether they must obtain an Individual Contributor License Agreement (ICLA). A developer responds that the policy does not require an ICLA for the type of smaller contribution that the volunteer is proposing. The developer then guides the volunteer through established project processes for contributing to the code, including what mailing lists to use and how to submit their feature as a patch.\n\nCase C. In December 2016, a developer in the Airflow project community raised concerns over the integration testing infrastructure offered by Apache, citing unnecessary obstacles it imposes on volunteer contributors. The developer offers their resources as an alternative, with the caveats that they will administer it and control access. This triggers a discussion on the technical merits of the developer\u2019s concerns, and a policy discussion as to whether ASF permits the use of unofficial alternative infrastructure options. Several developers conclude that a transition is technically advisable and institutionally sound, and the community transitions to the alternative integration testing framework.\n\nCase D. In September 2015, the Kalumet project received a proposal that it be retired from ASFI after its code had been languishing for several months. Contributors agreed upon retirement almost unanimously. One contributor, identifying features of the project that could be of use to other ASF and ASFI projects, suggests distributing key parts of its functionality to other active projects. The retirement vote is ultimately followed by developer effort distributing Kalumet\u2019s assets.\n\nThese cases illustrate how institution-side policy discussion and sociotechnical-side project contributions interact, with developments on the artifact motivating policy discussions, and policy constraints steering developer effort. With longitudinal data on both institutional and socio-technical variables, we now transition to a quantitative investigation of these relationships.\n\n5.4 RQ3: Are periods of increased Institutional Statements frequency followed by changes in the project organizational structure, and vice-versa?\n\nIn the previous RQs, we conducted exploratory and qualitative studies of the IS extraction technology, and of IS and socio-technical variable changes over time. In this section, we investigate the temporal relationship between our measures of institutional governance and organizational structure, as OSS projects progress on their incubation trajectories. As predicted by contingency theory, our hypothesis is that during project evolution, developers and mentors must make time...\nFig. 4. The Granger Causality between Institutional Statements and Socio-Technical networks. The blue/purple directed links indicate Granger causality from ST/IS measures, respectively. A green bi-directional link indicates that there is two-way significant temporal relationship (p-val < .001). Graduated projects seem to have fewer links from ST variables to IS variables, suggesting a more unidirectional flow from institutional to sociotechnical changes in successful projects (color online).\n\nfor decisions related to their organizational structure, contingent on ASF-required institutional arrangements and governance. That is, incubating projects change their organizational structure based on the institutional norms and rules being discussed, as required of them as a potential new member of the ASF community. And vice versa, organizational changes can incite follow-up discussions about institutional processes. To test for RQ3, here we use the pair-wise Granger causality test with lagged order of 2. We run the test for all pairs between the institutional statements and socio-technical variables, resulting in 36 separate tests for the graduated projects set and 36 for the retired ones. We adjust our p-values for multiple hypothesis testing to control false discovery rate, using the Benjamini-Hochberg procedure [14]. We only consider significant with p-val < 0.001.\n\nThe results are summarized in Figure 4, where a directed edge from node $X$ to node $Y$ indicates that $X$ Granger-causes $Y$, i.e., change in $X$ is the precursor to the change in $Y$. Also, as discussed in Section 4.6, the Granger approach we used is not a complete test of causality, but does yield an effect and its directionality, although without effect size or sign.\n\nWe observe a large number, 31 (out of 72 total), of Granger-causal relationship between the measures of institutional governance and the organizational structure. Of those 31 Granger-causal relationships, 15 are from the graduated set and 16 from the retired set, and 8 of the relationships are shared between the sets. We conclude that there is a significant Granger-causality between changes in institutional governance discussions and the organizational structure of the projects. We note 8 bidirectional relationships\\(^\\text{13}\\), the remaining 15 are unidirectional.\n\n\\(^{13}\\)Bidirectional causality indicates feedback of some sort. E.g., supply causes demand, and demand in turn, causes supply.\nWe look at graduated projects first. Interestingly, Figure 4, top, shows that the number of ISs from mentors, committers, and contributors has effects on the technical network, and vice-versa for the latter two. Namely, IS from all roles (mentors, committers, and contributors) Granger-cause changes in the technical networks, i.e., on developer file productivity ($t_{num\\_files\\_per\\_dev}$), and total number of coding files changed ($t_{num\\_file\\_nodes}$) variables. Mentor IS, additionally, Granger-cause changes to number of developers ($t_{num\\_dev\\_nodes}$). This is consistent with ASFI expectations that a mentor\u2019s emails provide advice and engage people, and conversely, that a drop in engagement may elicit mentors\u2019 engagement. Mentors usually do not code, which is presumably why they Granger-cause but do not appear in feedback relationships with any of the technical network variables.\n\nNotably absent, however, are links from mentor and contributor ISs into social network variables. Only committer ISs (bidirectionally) Granger-cause changes in the social network density, which, perhaps, simply indicates that ISs from committers induce substantial traffics in the social network, which in turn gets committers to discuss policy and rules issues. We have observed situations where mentors are likely to interrupt the projects when the projects become less active (either socially or technically)\\(^{14}\\). On the other hand, it could also be that a mentor is reacting to some particular broader discussion among developers, e.g., one on a monthly report.\n\nTogether, the above tells a story of the importance to the technical networks of changes in any IS variable. Surprisingly, mentor IS changes are not as consequential to the social network, seemingly at odds with the ASF community-first goals. Thus, there may be room to enhance community engagement with mentors and vice-versa.\n\n**RQ 3 Summary:** In both graduated and retired projects, there are no inputs from the IS into the social network variables, even though there are IS inputs into all technical network variables. Retired projects exhibit less bidirectionality between ST and IS variables. Finally, and interestingly, among retired projects, there are causal inputs into contributor ISs from both the social and technical variables. This is not the case for the graduated projects.\n\n### 6 DISCUSSION\n\nIn this study, we use individual institutional prescriptions, Institutional Statements (IS), and the Socio-Technical (ST) network features to reason about OSS project sustainability. OSS projects are a form of digital public goods which, like other public goods (e.g., water, forest, marine, etc.), can be subject to degradation due to over-harvesting, e.g., in the form of free-riders who take advantage of OSS but do not contribute to the required resources for development and maintenance of the software. Ostrom\u2019s work illuminated the fact that many communities avoid the dreaded \u2018Tragedy of the Commons\u2019, and other collective action problems, through the hard work of designing and implementing self-governing institutions. In that context, the ASF is a nonprofit foundation that, through its incubation program, encourages nascent OSS projects to follow some ASF-guided operational-level rules or policies around their self-governance. The OSS projects that join the ASF incubator trade some of the freedom of unlimited institutional choice in exchange for incubator resources that increase their chances of enduring the collective action problems that characterize OSS development [36], and becoming sustainable in the long run.\n\nWe found that in the ASF Incubator, the amount of institutional statements and levels of socio-technical variables are associated with projects graduation outcome, suggesting that the measures of institutional governance and organizational structure can signal information on sustainability.\n\n\\(^{14}\\)An example of mentor interrupting project warble: https://lists.apache.org/thread/x6h8pzhmfwtyy354ml1xm9sylq4y5r7l\nIn particular, in RQ1, the Mann-Whitney U test shows that the graduated projects have significantly more ISs from all three types of participants: committers, contributors, and project mentors than retired projects. This, presumably, is indicative of more active or intentional self-governance. In theoretical and empirical work on commons governance, it is well documented that getting self-governing institutions \u2018right\u2019 is hard work and takes time and effort [32]. This is consistent with a narrative that participants in graduated projects debate and work harder on their project\u2019s operational-level institutional design.\n\nRecent work has shown that ASFI graduate and retired projects have sufficiently different socio-technical structures [46], so that graduation can be predicted early on in development at 85+% accuracy. The results in RQ2, show that, for the first 3 months of incubation, developer nodes in the social networks of graduated projects increase at a higher rate (means increase from 10.1 to 17.1, and from 7.3 to 9.1 for graduated and retired projects, respectively), suggesting graduated projects were able to keep developers contributing more actively or recruit more new members. On the other hand, for the first 3 months, we also found that the amount of Institutional Statements by mentors increases in graduated projects, and decreases in retired projects (from 19.7 to 22.7 vs 22.6 to 14.6, for graduated and retired projects, respectively), suggesting that the initial help from project\u2019s mentors is of importance.\n\nTo further study the effects of ISs, we performed a deep-dive into IS topics. We found the topics of institutional-relevance in the graduated projects differ from those of the retired projects, specifically, we find that the topic of documentation (topic 7) in graduated projects is more prevalent than in retired projects. On the other hand, we found that the topics of mentorship (topic 11) of retired projects are significantly higher than retired projects, signaling that the retired projects might be struggling during the incubation. Combined with the fact that there are more developer nodes in both the social and technical networks, together the findings suggest that graduated projects have more capacity and energy to attend to non-coding issues, like documentation, than retired projects do. However, even among graduated projects there is still diversity in the institutional statements. Thus, as predicted by contingency theory, as well as Ostrom\u2019s theory of institutional diversity [33], a one-size-fits-all solution to a successful trajectory toward sustainability is not likely. Instead, future work should focus on gathering larger corpora of data, to be able to resolve individual or small-group differences in sustainable projects.\n\nOur framework allowed us to combine the IS and STS structures and study them together over time. With it, in RQ3, we found two-way, causal correlations between socio-technical variables and ISs over time, arguably indicating that OSS project socio-technical structure and their governance structure evolve together, as a coupled system. In addition, our methods point to a way to study possible interventions in underperforming projects. Specifically, the finding that in retired projects there are bi-directional links from committer\u2019s ISs to all three features of technical networks (i.e., \\( t_{\\text{num\\_dev\\_nodes}} \\), \\( t_{\\text{num\\_file\\_per\\_dev}} \\), \\( t_{\\text{num\\_file\\_nodes}} \\)), suggest that increase in committer\u2019s IS are interleaved with changes in features of the socio-technical networks.\n\nAs for the design implications, in addition to the current categories of mailing lists in ASF incubator (e.g., \u2018commit\u2019, \u2018dev\u2019, \u2018user\u2019, etc.), there can be a benefit to creating a separate mailing list, for institutionally-related discussions to help committers (and also for mentors and contributors) participate faster in those discussions in a timely manner. This could be made more useful using technology for self-monitoring, with which project participants could monitor a project\u2019s digital traces and discussions in order to more quickly react to episodic events. Some such tools have already been created for socio-technical networks in ASFI projects [34], and could be extended to include ISs as well. Such tools can help identify entry points and targets for interventions, whereby underperforming projects could be leaned on, internally or externally, via rules or advice to adjust their trajectories.\nContributions to Institutional Analysis and Socio-technical System Theory. Making a full circle, our findings also point to ways in which the theories we started from can be refined or extended. We find, in Sect. 5.4, evidence that the features of OSS projects\u2019 socio-technical systems co-change together with the amount of Institutional Statements in them, and that the co-change relationships are sparse. This evidence of co-change implies that the OSS projects\u2019 structure and their governance form a (loosely) coupled system. From a controllability point of view, a dynamically coupled system refines Smith et al.\u2019s mechanistic binary notion of \u2018inside\u2019 and \u2018outside\u2019 interventions [40].\n\nOur findings also suggest that for OSS projects, adopting additional rules and norms (e.g., by joining ASFI) can be worth the loss of some freedoms, as the Institutional Statements (Sect. 5.2, 5.3, 5.4) seem to serve to organize the project\u2019s actions and discussions, as predicted by Siddiki et al. [39] and Crawford and Ostrom [10]. Thus, our findings tie in with, and potentially extend the Institutional Analysis Design (IAD) view, suggesting that the feedback between the socio-technical system structure and institutional governance analysis is sufficiently direct and significant, and should be considered unitary in further studies.\n\nMore practically, our institutional statement predictor, although still a work in progress, can effectively predict atomic elements of self-governance. As such, it can be used as a tool to provide quantitative data for applying institutional analysis and design (IAD) more generally, e.g., to OSS projects that are outside of ASF, or self-governed systems with public documents and discussion forums.\n\n7 THREATS TO VALIDITY\n\nFirst, our data is from only hundreds of projects ASF incubator projects. Thus, generalizing the implications beyond ASF, or even beyond the ASF Incubator projects carries potential risks, for example, OSS projects in other incubator programs may not have mentors. Expanding the dataset beyond the ASF incubator, e.g., with additional projects from other OSS incubator programs could lower this risk. Second, we do not consider communication channels other than the ASF mailing lists, e.g., in-person meetings, website documentation, private emails, etc. However, ASF mandates the use of the public mailing lists for most project discussions, a policy that ensures a particularly low risk of missing institutional or socio-technical information. Annotations of the Institutional Statements (IS) can be biased by individual annotators, while we gave the annotators sufficient training and reference documentation which lowers the risk. We expect the performance of the classifier as we increase the size of the training set and better incorporate contextual information, and we plan to distinguish types of ISs for future work. In OSS projects, developers may use their different emails or aliases, which in turn complicates the identification of distinct developers, while assigning and insisting on using a unique apache.org domain email address reduces such risks.\n\nFinally, as noted in Sect. 4, there are likely cases where OSS projects that have retired from the ASF Incubator program still go on to become sustained over time. In these instances, some OSS projects entering the ASFI may simply not be a good fit for the ASF culture and institutional requirements or policies and ultimately retire as a result. In this paper, we explicitly use graduation as a measure of sustainability given that this is an ultimate goal of the ASFI \u2013 to create projects that can indeed be sustainable. But we want to recognize the point that few retired projects still could become sustainable by following a different path than association with ASF.\n\n---\n\n15 The Apache Way: http://theapacheway.com/on-list/\n16 ASF committer emails: https://infra.apache.org/committer-email.html\n8 CONCLUSION\n\nUnderstanding why OSS projects cannot meet the expectations of nonprofit foundations may help others improve their individual practice, organizational management, and institutional structure. More importantly, understanding the relationship between institutional design and socio-technical aspects in OSS can bring insights into the potential sustainability of such projects. Here we showed that quantitative network science features can capture the organizational structure of how developers collaborate and communicate through the artifacts they create. Combining the two perspectives, socio-technical measures, and institutional analysis, we leverage the unique affordances of the Apache Software Foundation\u2019s OSS Incubator project to extend the modeling of OSS project sustainability, leveraging a novel longitudinal dataset, a vast text and log corpus, and extrinsic labels for the success and failure of project sustainability.\n\nACKNOWLEDGEMENTS\n\nThe authors greatly thank the reviewers for their constructive comments. This material is based upon work supported by the National Science Foundation under GCR grant no. 2020751 and no. 2020900.\n\nREFERENCES\n\n[1] Barclay, D. W. Interdepartmental conflict in organizational buying: The impact of the organizational context. *Journal of Marketing Research* 28, 2 (1991), 145\u2013159.\n\n[2] Benkler, Y. *The wealth of networks*. Yale University Press, 2008.\n\n[3] Bird, C., Gourley, A., Devanbu, P., Gertz, M., and Swaminathan, A. Mining email social networks. In *Proceedings of the 2006 international workshop on Mining software repositories* (2006), pp. 137\u2013143.\n\n[4] Bird, C., Nagappan, N., Gall, H., Murphy, B., and Devanbu, P. Putting it all together: Using socio-technical networks to predict failures. In *2009 20th International Symposium on Software Reliability Engineering* (2009), IEEE, pp. 109\u2013119.\n\n[5] Bird, C., Pattison, D., D\u2019Souza, R., Filkov, V., and Devanbu, P. Latent social structure in open source projects. In *Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering* (2008), pp. 24\u201335.\n\n[6] Blomquist, W., et al. *Dividing the waters: governing groundwater in Southern California*. ICS Press Institute for Contemporary Studies, 1992.\n\n[7] Cheung, Y.-W., and Lai, K. S. Lag order and critical values of the augmented dickey\u2013fuller test. *Journal of Business & Economic Statistics* 13, 3 (1995), 277\u2013280.\n\n[8] Cohan, A., Beltagy, I., King, D., Dalvi, B., and Weld, D. S. Pretrained language models for sequential sentence classification. In *Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing* (Hong Kong, China, 2019), Association for Computing Machinery, p. 3693\u20133699.\n\n[9] Cooke-Davies, T. The \u201creal\u201d success factors on projects. *International journal of project management* 20, 3 (2002), 185\u2013190.\n\n[10] Crawford, S., and Ostrom, E. A grammar of institutions. *American Political Science Review* 89, 3 (1995), 582\u2013600.\n\n[11] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. *arXiv preprint arXiv:1810.04805* (2018).\n\n[12] Ducheneaut, N. Socialization in an open source software community: A socio-technical analysis. *Computer Supported Cooperative Work (CSCW)* 14, 4 (2005), 323\u2013368.\n\n[13] Dumitrescu, E.-I., and Hurlin, C. Testing for granger non-causality in heterogeneous panels. *Economic modelling* 29, 4 (2012), 1450\u20131460.\n\n[14] Ferreira, J., and Zwinderman, A. On the benjamini\u2013hochberg method. *The Annals of Statistics* 34, 4 (2006), 1827\u20131849.\n\n[15] Fischer, G., and Herrmann, T. Socio-technical systems: a meta-design perspective. *International Journal of Sociotechnology and Knowledge Development (IJSKD)* 3, 1 (2011), 1\u201333.\n\n[16] Fleischman, F., Loken, B., Garcia-Lopez, G., and Villamayor-Tomas, S. Evaluating the utility of common-pool resource theory for understanding forest governance and outcomes in Indonesia between 1965 and 2012. *International Journal of the Commons* 8, 2 (2014).\n\n[17] Frischmann, B., Madison, M., and Strandburg, K. *Governing Knowledge Commons*. Oxford University Press, 2014.\n[18] Gonz\u00e1lez-Barahona, J. M., Lopez, L., and Robles, G. Community structure of modules in the apache project. In Proceedings of the 4th International Workshop on Open Source Software Engineering (2004), IET, pp. 44\u201348.\n\n[19] Gruby, R. L., and Basurto, X. Multi-level governance for large marine commons: politics and polycentricity in palau\u2019s protected area network. Environmental science & policy 33 (2013), 260\u2013272.\n\n[20] Hardin, G. The tragedy of the commons: the population problem has no technical solution; it requires a fundamental extension in morality. science 162, 3859 (1968), 1243\u20131248.\n\n[21] Herrmann, T., Hoffmann, M., Kunau, G., and Loser, K.-U. A modelling method for the development of groupware applications as socio-technical systems. Behaviour & Information Technology 23, 2 (2004), 119\u2013135.\n\n[22] Hess, C., and Ostrom, E. Understanding knowledge as a commons: From theory to practice. JSTOR, 2007.\n\n[23] Hissam, S., Weinstock, C. B., Plakosh, D., and Asundi, J. Perspectives on open source software. Tech. rep., Carnegie Mellon Univ Pittsburgh PA - Software Engineering Inst, 2001.\n\n[24] Joblin, M., and Apel, S. How do successful and failed projects differ? a socio-technical analysis. ACM Trans. Softw. Eng. Methodol. (dec 2021).\n\n[25] Joslin, R., and M\u00fcller, R. The impact of project methodologies on project success in different project environments. International Journal of Managing Projects in Business (2016).\n\n[26] Lehtonen, P., and Martinsuo, M. Three ways to fail in project management and the role of project management methodology. Project Perspectives 28, 1 (2006), 6\u201311.\n\n[27] Lopez, J. H. The power of the adf test. Economics Letters 57, 1 (1997), 5\u201310.\n\n[28] Narduzzo, A., and Rossi, A. The role of modularity in free/open source software development. In Free/Open source software development. Igi Global, 2005, pp. 84\u2013102.\n\n[29] Olson, M. The logic of collective action [1965]. Contemporary Sociological Theory 124 (2012).\n\n[30] O\u2019Reilly, T. Lessons from open-source software development. Communications of the ACM 42, 4 (1999), 32\u201337.\n\n[31] Ostrom, E. Governing the commons: The evolution of institutions for collective action. Cambridge university press, 1990.\n\n[32] Ostrom, E. Understanding institutional diversity. Princeton university press, 2009.\n\n[33] Ostrom, E., Janssen, M., and Andereis, J. Going beyond panaceas. Proceedings of the National Academy of Sciences 104, 39 (2007), 15176\u201315178.\n\n[34] Ramchandran, A., Yin, L., and FilKov, V. Exploring apache incubator project trajectories with apex. In 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR) (2022), IEEE, p. Accepted.\n\n[35] Ropohl, G. Philosophy of socio-technical systems. Techn\u00e9: Research in Philosophy and Technology 4, 3 (1999), 186\u2013194.\n\n[36] Schweik, C. M., and English, R. Tragedy of the foss commons? investigating the institutional designs of free/libre and open source software projects. First Monday (2007).\n\n[37] Schweik, C. M., and English, R. C. Internet success: a study of open-source software commons. MIT Press, 2012.\n\n[38] Sen, A., Atkisson, C., and Schweik, C. M. Cui bono: Do open source software incubator policies and procedures benefit the projects or the incubator? Available at SSRN (2021).\n\n[39] Siddiki, S., Heikkila, T., Weible, C. M., Pacheco-Vega, R., Carter, D., Curley, C., Deslatte, A., and Bennett, A. Institutional analysis with the institutional grammar. Policy Studies Journal (2019).\n\n[40] Smith, A., and Stirling, A. Moving outside or inside? objectification and reflexivity in the governance of socio-technical systems. Journal of Environmental Policy & Planning 9, 3-4 (2007), 351\u2013373.\n\n[41] Surian, D., Tian, Y., Lo, D., Cheng, H., and Lim, E.-P. Predicting project outcome leveraging socio-technical network patterns. In 2013 17th European Conference on Software Maintenance and Reengineering (2013), IEEE, pp. 47\u201356.\n\n[42] Trist, E. The evolution of socio-technical systems: A conceptual framework and an action research program. Ontario Ministry of Labour, 1981.\n\n[43] Turner, J. R., and M\u00fcller, R. Communication and co-operation on projects between the project owner as principal and the project manager as agent. European management journal 22, 3 (2004), 327\u2013336.\n\n[44] \u0158eh\u016f\u0159ek, R., Sojka, P., et al. Gensim\u2014statistical semantics in python. Retrieved from genism. org (2011).\n\n[45] Wearn, S., and Stanbury, A. A study of the reality of project management: Wg morris and gh hough, john wiley, uk (1987) e 29.95, isbn 0471 95513 pp 295. International Journal of Project Management 7, 1 (1989), 58.\n\n[46] Yin, L., Chen, Z., Xuan, Q., and FilKov, V. Sustainability forecasting for apache incubator projects. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (New York, NY, USA, 2021), Association for Computing Machinery, p. 1056\u20131067.\n\n[47] Yin, L., Zhang, Z., Xuan, Q., and FilKov, V. Apache software foundation incubator project sustainability dataset. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) (2021), IEEE, pp. 595\u2013599.\n\n[48] Yu, H., and Yang, J. A direct lda algorithm for high-dimensional data\u2014with application to face recognition. Pattern recognition 34, 10 (2001), 2067\u20132070.\n\nReceived July 2021; revised November 2021; accepted April 2022", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/018-yin.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 23, "total-input-tokens": 61854, "total-output-tokens": 20526, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 3651, 1], [3651, 7674, 2], [7674, 11349, 3], [11349, 15432, 4], [15432, 19723, 5], [19723, 23240, 6], [23240, 27441, 7], [27441, 31679, 8], [31679, 36124, 9], [36124, 40271, 10], [40271, 44169, 11], [44169, 48405, 12], [48405, 51704, 13], [51704, 55422, 14], [55422, 58204, 15], [58204, 60641, 16], [60641, 64459, 17], [64459, 66923, 18], [66923, 70962, 19], [70962, 75489, 20], [75489, 79416, 21], [79416, 83662, 22], [83662, 89034, 23]]}}
|
|
{"id": "97a9b971f51a5f786c23a08acab7889e0d8b1eb0", "text": "An empirical study on downstream workarounds for cross-project bugs\n\nHui Ding Wanwangying Ma Lin Chen Yuming Zhou Baowen Xu\nState Key Laboratory for Novel Software Technology\nNanjing University, China\ndinghui85@gmail.com, wwyma@smail.nju.edu.cn, {lchen, zhouyuming, bwxu}@nju.edu.cn\n\nAbstract\u2014GitHub has fostered complicated and enormous software ecosystems, in which projects depend on and co-evolve with each other. An error in an upstream project may affect its downstream projects through inter-dependencies, forming cross-project bugs. Though the upstream developers should fix the bugs on their side, proposing a workaround, i.e., a temporary solution in the downstream project is a common practice for the downstream developers. In this study, we empirically investigated the characteristics of downstream workarounds in the scientific Python ecosystem. Combining the statistical comparisons and manual inspection, we have the following three main findings. First, in general, the workarounds and the corresponding upstream fixes are significantly different in code size and code structure. Second, there are three kinds of cross-project bugs that the downstream developers usually work around. Last, four types of common patterns are identified from the investigated workarounds. The findings of this study lead to better understanding of cross-project bugs and the practices of developers in software ecosystems.\n\nKeywords\u2014GitHub ecosystems; cross-project bugs; workarounds; practices\n\nI. INTRODUCTION\n\nBenefiting from the social coding capabilities of GitHub, software development on GitHub has evolved beyond a single project into socio-technical ecosystems [1]. Projects rely on the infrastructure or functional components provided by other projects, forming complex inter-project dependencies. In this way, some bugs in the upstream projects may affect their downstream projects through the dependencies. This phenomenon was confirmed by Ma et al. [2]. In their study, they investigated cross-project correlated bugs, i.e., causally related bugs reported to different projects in scientific Python ecosystem on GitHub, focusing on how developers coordinate to triage and fix this kind of bugs.\n\nIn the context of cross-project bugs, it is no doubt that the upstream project where the bug roots should provide a radical cure. However, the affected downstream projects usually offer a workaround, i.e., a temporary solution locally to bypass the upstream error. Ma et al. posted a questionnaire in which they asked what the downstream developers usually did to deal with cross-project bugs. The result indicated that 89.3% of the respondents chose to propose a temporary workaround, which was proven to be the most common practice [2].\n\nWorkarounds are important in two folded [2]. First, it can be used to avoid the long-lasting impact of an upstream bug. A workaround must be implemented if the upstream team is not willing or able to fix the bug quickly, and it allows the downstream project to temporarily suppress the upstream bug. Second, adding a workaround for an upstream bug enables the downstream project to support buggy upstream version without affecting the end users. As many users may still use an old version of the upstream project, the downstream developers cannot rely on a fix in the next upstream release. Therefore, the downstream developers have to work around bugs regardless of whether they have been already fixed upstream.\n\nDespite the wide use and importance of the workarounds for cross-project bugs, little work has paid attention on this issue. Studying the workaround will help to understand not only the fixing process of cross-project bugs, but also the coordination between projects in a software ecosystem. Therefore, we conduct this study to investigate the characteristics of the downstream workarounds in the context of cross-project bugs.\n\nWe base our study on scientific Python ecosystem on GitHub. For a cross-project bug, we refer to the patch injected into the buggy upstream project as the upstream fix, while the temporary solution provided for the affected downstream project as the downstream workaround. We make an investigation of the workarounds from three aspects. First, we compare the code size and design of the workarounds with those of the corresponding upstream fixes. Second, we inspect whether the cross-project bugs that were worked around in downstream projects have something in common. Third, we investigate whether software practitioners developed the workarounds in some common ways.\n\nThe main contributions of this study is as follows. First, we extract 60 downstream workarounds in the scientific Python ecosystem. Second, we identify three kinds of cross-project bugs that the downstream developers usually work around. Third, we summarize four common workaround patterns. Last, we provide several design requirements for the workaround supporting tools.\n\nThe rest of the paper is organized as follows. Section II describes related work. Section III presents our research methodology, and Section IV shows our empirical results. We propose further discussions on our findings in Section V, and\nII. RELATED WORK\n\nA. Cross-project Bugs\n\nAs the development of software ecosystems, more and more cross-project bugs appear and attract the attention of an increasing number of researchers.\n\nSome existing studies showed that cross-project bugs brought many troubles to ecosystem developers. Decan et al. [3] reported that the developers in R ecosystems felt it more and more of a pain if the upstream packages broke. Adams et al. [4] indicated that the core activity of integration for open source distributions was synchronizing the newer upstream version. To avoid the cross-project bugs, developers had to pay great attention on the synchronizing process. Bavota et al. [5] found that the upstream upgrade would have strong effects on downstream projects when there were general dependencies between them. Their study showed that a large amount of downstream code had to be modified when the upstream project changed if the downstream project depended on the upstream framework or general services. In that case, the upstream bugs would leave a wide impact on the downstream projects.\n\nSome other researches focused on the coordination between developers in different projects during fixing cross-project bugs. Villarroel et al. [6] leveraged the reviews of App users to help developers realize the downstream demand. They classified and prioritized the downstream reviews, so that the upstream developers were able to catch the important bugs quickly. Ma et al. [2] studied how developers fixed cross-project correlated bugs in scientific Python ecosystem. Combining manual inspection and the results of an online survey, they revealed how developers, especially those on the downstream side tracked the root cause of cross-project bugs and dealt with them to eliminate their bad effects. Our study bases on and extends that work. We focus on a specific but common practice of the downstream developers when facing cross-project bugs, i.e., proposing a workaround.\n\nB. Blocking Bugs\n\nAnother special type of bugs is blocking bugs which are to some extent similar to cross-projects bugs. Blocking bugs prevent other bugs (in the same or other projects) from being fixes. It often happens because of a dependency relationship among software components. Under the environment, the developers cannot fix their bugs because the modules that they are fixing depend on other modules that have unresolved bugs. Due to their severe impact, some researchers have turned their eyes to blocking bugs.\n\nGarcial and Shihab [7] found that it took two to three times longer to fix blocking bugs than non-blocked bugs. They then employed decision tress to predict whether a bug is a blocking bug or not. They extracted 14 kinds of features to construct the predictor and evaluated which features were most influential to indicate the blocking bugs.\n\nLater, Xia et al. [8] proposed a novel method named ELIBloker to identify blocking bugs with the class imbalance phenomenon taken into account. ELIBloker utilized more features and combined multiple classifiers to learn an appropriate imbalance decision boundary. ELIBloker outperformed the method in [7] by up to 14.7% F-measure.\n\nUnlike blocking bugs which prevent the fixing of bugs in the dependent modules, cross-projects bugs occur in upstream projects but affect the normal operation of the downstream projects. For the affected downstream modules/projects, the developers attempt to take some action to be released from the blocking/cross-project bugs in other components. In this paper, we investigate the downstream practices when facing cross-project bugs.\n\nC. Design of Bug Fixes\n\nFixing software bugs is an important activity during software maintenance. Developers devote substantial efforts to design the bug fixes, which reflect the developers\u2019 expertise and experience. Various studies investigated the nature and design of bug fixes. Zhong and Su [9] extracted and analyzed more than 9000 real-world bug fixes from six Java projects. They obtained 15 findings which could gain insights on automatic program repair. Pan et al. [10] explored the underlying bug fix patterns and identified 27 bug fix patterns that were amenable to automatic detection. Park et al. [11] analyzed bugs which were fixed more than once to understand the characteristics of incomplete patches. They revealed that predicting supplementary patch was a difficult problem. Jiang et al. [12] conducted an study on the characteristics of Linux kernel patches that could explain patch acceptance and reviewing/integration time. Misirli et al. [13] proposed a measure to study the impact of fix-inducing changes. They found that the lines of code added, the number of developers who worked on a change, and the number of prior modifications on the files modified during a change were the best indicators of high-impact fix-inducing changes. Echeverria et al. [14] evaluated developers\u2019 performance on fixing bugs and propagating the fixes to other products in industrial Software Product Line.\n\nAccording to different characteristics of bug fixes, researches developed various automatic tools to support bug repair. Goues et al. [13,14] used genetic programming to repair bugs in C programs, and evaluated what fraction of bugs could be repaired automatically. They generated a large, indicative benchmark set for systematic evaluations. Mechtaev et al. [17] presented a semantics-based repair method applicable for large-scale real-world software. Gu et al. [18] considered bad fix problem and implemented a prototype that automatically detects bad fixes for Java programs.\n\nWhen fixing bugs, developers may have different options to design the bug fix. Leszak et al. [19] pointed out that some defects were not fixed by correcting the real error-causing component, but rather by a workaround injected at another location. An online material gives a clear description about the workaround [20]: \u201cA workaround is a far less elegant solution to the problem. Typically, a workaround is not viewed as something that is designed to be a panacea, or cure-all, but\nrather as a crude solution to the immediate problem. As a temporary fix, a workaround will do very well until a suitable permanent fix can be implemented by project management personnel.\u201d Murphy-Hill et al. [21] studied why a developer might choose a workaround instead of a fix at a real location. They summarized six factors: risk management, interface breakage, consistency, user behavior, cause understanding, and social factors. Some other studies also paid attention to the phenomenon of workarounds. Ko et al. [22] found that if a bug had a known workaround, developers often focused on more severe bugs. Berglund [23] indicated that bugs could be worked around and workarounds were relevant in early stages of the bug fixing process.\n\nDifferent from most existing studies which investigated the design of fixes for within-project bugs, our study concentrates on the characteristics of downstream workarounds in the context of cross-project bugs.\n\nIII. RESEARCH METHODOLOGY\n\nIn this section, we first introduce how we collected data in the study. Then we present the research questions. Finally, we describe the research methods used to investigate the questions.\n\nA. Data Source\n\nThe cross-project bugs under investigation were collected by Ma et al. [2]. The data are available online1. The dataset contains 271 pairs of cross-project bugs gathered from scientific Python ecosystem on GitHub. Every pair includes an upstream issue reported to the root-cause project and a downstream issue reported to the affected project. Specifically, these cross-project bugs involve 204 projects including seven core libraries in the ecosystem, that is, IPython2, NumPy3, SciPy4, Matplotlib5, Pandas6, Scikit-learn7, and Astropy8.\n\nSince our study focuses on the workarounds, we are only interested in the cross-project bugs for which the downstream developers have provided a workaround. In order to extract the data we needed, we manually read all the bug reports on the downstream side of the 271 pairs of bugs. If the downstream developers were willing to propose a workaround, they were very likely to leave related information in the issue reports. For example, a developer of IPython suffering a bug of Setuptools commented, \u201cI\u2019ll open an Issue on setuptools to deal with this, and figure out what the best workaround in IPython should be.\u201d (ipython/ipython#8804) Two of the authors of this paper carried on this task and found 60 pairs of cross-project bugs to further investigate in this study.\n\nFor the 60 pairs of bugs, we concentrated on their downstream workarounds and the corresponding upstream fixes. Usually, the upstream issue will link to the bug-fix commits if it has been repaired. Also, if the downstream issue was worked around, the commits including the workaround would be indicated. By manually inspecting the issue reports, the two authors linked every pair of closed cross-project bugs with the commits containing the fix/workaround. Note that nine cross-project bugs have not been fixed by the upstream projects. Therefore, in total, we collected 60 downstream workarounds and 51 upstream fixes.\n\nB. Research Questions\n\nThe aim of this study is to investigate the characteristics of downstream workarounds in the context of cross-project bugs. In particular, we attempt to answer the following three research questions:\n\nRQ1: Are there differences between downstream workarounds and the corresponding upstream fixes?\n\nCompared with the upstream fix, the workaround is injected in a different project and serves a different purpose. Therefore, is the design of workaround different from that of the fix? We compared them in two aspects: the code size and code structure.\n\nRQ2: Do the cross-project bugs that downstream developers work around have some common features?\n\nAs stated, not all of the cross-project bugs have workarounds. Then what features do these 60 bugs with workarounds have in common? In RQ2, we sought to find the answer.\n\nRQ3: Do the workarounds have some common patterns?\n\nIn RQ3, we attempted to find whether downstream developers worked around the upstream bugs in some common ways.\n\nC. Research Methods\n\n1) Quantitative analysis methods\n\nIn RQ1, the Wilcoxon signed-rank test and the Cliff\u2019s \u03b4 served to compare the code size between the upstream fixes and the downstream workarounds.\n\nThe Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used to compare whether two matched groups of data are identical [24]. The paired sample in our study are the sizes (concerning the number of modified files or the number of changed lines of code) in the downstream workarounds and upstream fixes. We set the null hypothesis $H_0$ and its alternative hypothesis $H_1$ as follows:\n\n$H_0$: The number of modified files / the number of changed lines of code in the downstream workarounds is the same as that in the upstream fixes.\n\n$H_1$: The number of modified files / the number of changed lines of code in the downstream workarounds is significantly different from that in the upstream fixes.\n\nWe assessed the test results at the significance level of 0.05. If the p-value obtained from the Wilcoxon signed-rank test was lower than 0.05, the sizes of workarounds and fixes were considered significantly different. Together with the\n\n---\n\n1 https://github.com/njuap/ICSE2017\n2 http://ipython.org, https://github.com/ipython/ipython\n3 http://www.numpy.org, https://github.com/numpy/numpy\n4 http://www.scipy.org/scipylib, https://github.com/scipy/scipy\n5 http://matplotlib.org, https://github.com/matplotlib/matplotlib\n6 http://pandas.pydata.org, https://github.com/pydata/pandas\n7 http://scikit-learn.org, https://github.com/scikit-learn/scikit-learn\n8 http://www.astropy.org, https://github.com/astropy/astropy\nmedian values of the sizes, we were able to decide whether the size of workaround was smaller than the size of its corresponding fix.\n\nFurthermore, we used the Cliff\u2019s $\\delta$ effect size to measure the magnitude of the difference between the sizes of workarounds and fixes. Cliff\u2019s $\\delta$ provides a simple way of quantifying the practical difference between two groups [25]. Of all kinds of effect sizes, Cliff\u2019s $\\delta$ is the most direct and simple variety of a non-parametric one [26]. By convention, the magnitude of the difference is considered either trivial ($|\\delta| < 0.147$), small (0.147-0.33), moderate (0.33-0.474), or large (> 0.474) [27].\n\n2) Qualitative analysis\n\nFor RQ2, RQ3, and part of RQ1, we performed a qualitative analysis to investigate the questions. Two authors manually inspected the issue reports and the code of fixes/workarounds for the cross-project bugs.\n\nThe two authors first individually completed the task following the same procedure and criteria. They reviewed the issue reports and code carefully, then executed the existing test cases provided by the developers to keep track of traces and to observe the input/output. During this procedure, they wrote down some necessary information: the bug information (bug type, root cause, bug impact, and participants), the bug context (related methods, test cases, traces, and input/output), and the workaround and fix strategies. And they also wrote down their findings.\n\nAfter individual investigation, they came together to discuss their findings and draw conclusions.\n\nIV. RESEARCH RESULTS\n\nA. RQ1: Differences Between Fixes and Workarounds\n\nIn order to compare the upstream fixes and the downstream workarounds, we first statistically compared their sizes in terms of the number of modified files and the number of modified lines of code. Then, we inspected the code structure of fixes and workarounds to see whether they were different.\n\nAmong the 60 pairs of cross-project bugs, nine of them have not been fixed in the upstream projects until now. Therefore, we could not compare their workarounds with upstream fixes. In RQ1, we only investigated the remaining 51 pairs of cross-project bugs.\n\n1) Statistical comparison of the size\n\nTABLE I. shows the minimum, the maximum, and the average values, as well as the 25th, 50th, and 75th percentiles of workaround/fix size. To facilitate a visual comparison, we also use boxplots to illustrate the size distributions (Fig. 1). It is clear that the number of modified files and the number of modified lines of code in workarounds are both smaller than those in fixes.\n\n\n\nWe also adopted the Wilcoxon signed-rank test and Cliff\u2019s $\\delta$ effect size to statistically compare the workarounds and fixes. The results are shown in TABLE II. The p-values less than 0.05 indicate that the number of modified files and the number of modified lines of code are significantly different between the workarounds and fixes. The values of Cliff\u2019s $\\delta$ mean that the difference in the number of changed files between them is small, but the difference in the number of modified lines of code is large.\n\n| #Files | #SLOC |\n|--------|-------|\n| P-value | 0.019 0.014 |\n| $|\\delta|$ | 0.232 0.771 |\n\nCombining the boxplots and the results of statistical tests, we conclude that the size of the workaround is significantly smaller than the size of the corresponding upstream fix.\n\n2) Inspection of code\n\nAfter statistically comparing the size of the downstream workarounds and the corresponding upstream fixes, we looked into their code to make a further investigation.\n\nIn general, for eight out of the 51 cross-projects bugs, the upstream fix and the corresponding downstream workaround were designed in the same manner. The developers from both sides had similar idea to modify their own projects when facing the bug. For example, using the Astropy normalizer led to a\n\n| TABLE I. THE SIZES OF THE UPSTREAM FIXES AND DOWNSTREAM WORKAROUNDS |\n|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|\n| | Min. | Max. | Avg. | 25th | 50th | 75th |\n| #Files | | | | | | |\n| Fixes | 1 | 8 | 3 | 2 | 2 | 4 |\n| Workarounds | 1 | 6 | 2 | 1 | 2 | 3 |\n| #SLOC | | | | | | |\n| Fixes | 1 | 829 | 93 | 19 | 36 | 105 |\n| Workarounds | 1 | 662 | 61 | 10 | 26 | 45 |\nTypeError in Sunpy when playing a mapcube peek animation (sunpy/sunpy#1532). It was caused by a bug in ImageNormalize class of Astropy which did not include a call to the inherited method autoscale_None() (astropy/astropy #4117). To address this problem, both Sunpy and Astropy used an explicit call to autoscale_None(). Fig. 2 shows the downstream workaround and upstream fix for this bug. Additionally, it is worth noting that the fix and the workaround were proposed by the same developer. Another example is shown in astropy/astropy#3052 which was caused by numpy/numpy #5251. The downstream workaround was just a copy of the upstream fix for the cross-project bug.\n\nFor the remaining 43 out of the 51 cross-project bugs, the downstream developers worked around them in a different way from what the upstream developers did to fix the bugs. This seems to accord with our intuition. Whether for within-project or cross-project bugs, a workaround is a short-term solution injected in a place other than the true root-cause location. For cross-project bugs, the workaround is placed in the downstream project where the upstream buggy method is called, while the ultimate fix is to repair the buggy method itself. Intuitively, the two kinds of modification are usually different, which is confirmed by our observations.\n\nIn Section IV.C, we will discuss the workaround patterns in detail.\n\nB. RQ2: Common Bug Features\n\nBy manually inspecting the issues reports of the 60 cross-project bugs, we found that some bugs did have something in common. We totally identified three kinds of common features. Forty-nine investigated bugs could be classified into the remaining 11 bugs have distinct characteristics themselves and cannot be put into any category.\n\n1) Emerging cases\n\nA cross-project bug was reported when the downstream project encountered an emerging case that the upstream method did not cover. Thirty-nine of the 60 cross-project bugs could be classified into this kind. More specifically, we divided the 39 bugs into two subcategories.\n\nFirst, the original upstream method could not process certain types or forms of data. For example, astropy/astropy#3052 reported that a method in NumPy did not use suitable format for Unicode data (numpy/numpy#5251). Astropy/astropy#4658 was caused by np.median from NumPy that could not handle the masked arrays (numpy/numpy#7330). Luca-dex/pyTSA#18 worked around an upstream bug that Pandas could not read csv files if the column separator was not comma (pandas-dev/pandas#2733).\n\nSecond, the upstream method might not consider the processing of edge cases. For example, the method utilities.autowrap.ufuncify in Sympy failed when the length of the symbol list was larger than 31 (sympy/sympy#9593). The failure resulted from an error in the method frompyfunc of NumPy, which did not check the number of arguments (numpy/numpy#5672).\n\n2) Wrong outputs\n\nSometimes, the upstream methods might produce wrong results with specific inputs which could break their downstream projects. Six of the studied upstream bugs were caused by wrong outputs.\n\nThe wrong outputs are partly caused by the incorrect design of the functionality. Blaze/odo#331 was caused by the wrong output of datetime64 series in Pandas. The method should return NAT instead of NaN with an empty series (pandas-dev/pandas#11245). In NumPy, np.log1p(inf) returned NaN while it should return Inf (numpy/numpy#4225), which led to\n\n```\n@@ -203,7 +205,11 @@ def updatefig(i, im, annotate, ani_data, removes):\n 203 205\n 204 206 im.set_array(ani_data[i].data)\n 205 207 im.set_cmap(self.maps[i].plot_settings['cmap'])\n 206 - im.set_norm(self.maps[i].plot_settings['norm'])\n 208 + norm = deepcopy(self.maps[i].plot_settings['norm'])\n 210 + # The following explicit call is for bugged versions of Astropy's ImageNormalize\n 211 + norm.autoscale_None(ani_data[i].data)\n 212 + im.set_norm(norm)\n```\n\n(a). The downstream workaround\n\n```\n@@ -67,5 +67,8 @@ def __call__(self, values, clip=None):\n 67 68 values = np.array(values, copy=True, dtype=float)\n 69 70 + # Set default values for vmin and vmax if not specified\n 71 + self.autoscale_None(values)\n 70 73 + # Normalize based on vmin and vmax\n 71 74 np.subtract(values, self.vmin, out=values)\n```\n\n(b). The upstream fix\n\nFig. 2. The comparison of the code for the downstream workaround and the corresponding upstream fix.\nan undesired result in Nengo (nengo/nengo#260).\n\nSome other unexpected outputs of the upstream methods were introduced by the carelessly incompatible changes when the upstream developers fixed another bug or developed a new feature. For example, the method combine_first in new version of Pandas performed an unwanted conversion of dates to integers (pandas-dev/pandas#3593), which made some modules of Clair unusable (eike-welk/clair/#43).\n\n3) Python 3 incompatibility\n\nSome upstream methods could not perform correctly under Python 3 while they could work perfectly under Python 2. Then, when running downstream projects in Python 3, the original upstream method resulted in a bug. For example, method loadtxt in NumPy failed with complex data in Python 3 (numpy/numpy#5655), which affected its downstream project msmtools (markovmodel/msmtools#18). Totally, four of the 60 cross-project bugs are due to Python 3 incompatibility.\n\nC. RQ3: Workaround Patterns\n\nAfter investigating the characteristics of cross-project bugs with workarounds, we summarized the common patterns from the studied workarounds. Generally, we found four workaround patterns covering the workarounds for 37 cross-project bugs.\n\n1) Pattern 1: Using a different method\n\nWhen an upstream method that the downstream project used has a bug, it is a simple way to replace the buggy one with a similar method.\n\nExample: The Obspy developer experienced segmentation faults on certain systems when constructing a NumPy array (obspy/obspy#536). After investigation, this bug was caused by an error in np.array (numpy/numpy#3175). The downstream developers worked around the cross-project bug by using np.frombuffer instead of np.array. Fig. 3 shows the downstream workaround.\n\nTen out of the 60 workarounds were designed to adopt another method that could provide the same functionality. However, most of the replacements were provided by the original upstream projects. As in the example above, np.frombuffer and np.array comes from the same project NumPy. This phenomenon implies two things. First, some libraries may tend to develop multiple methods with overlapping capabilities. Second, the downstream projects are not willing to change their dependencies. It is reasonable since adding a new dependency means that more effort should be laid on downstream project to understand the release cycle of the new upstream project and to coordinate with it.\n\nThe main challenge in proposing this kind of workaround lies in two aspects. The first is to find a replacement method that is preferably designed by identical upstream project or at least a stable project. Second, the parameters should be carefully modified to fit the new method since it may require a different kind of parameter compared with the buggy method. The challenge also indicates that an automatic tool to recommend similar APIs and adapt parameters will be useful for developers to work around a cross-project bug.\n\n2) Pattern 2: Conditionally using the original method\n\nAs we have stated in IV.B, most of the cross-project bugs are caused by one or more uncovered cases of the upstream methods. Therefore, an intuitive way to work around the bug is to only use the method in the cases that will not result in a failure.\n\nExample: Scipy/scipy#3596 recorded a bug that scipy.signal.fftconvolve did not work well in multithreaded environments. After digging into this issue, the developers found that scipy.signal.fftconvolve made use of numpy.fft.rfftn /irfftn for non-complex inputs and it was NumPy\u2019s FFT routines that were actually not thread safe. Though later numpy/numpy#4655 fixed the bug in NumPy, the SciPy developers still thought that they should work around it in their side, because they support older NumPy version that did not have the fix. Fig. 4 shows the downstream workaround. For pre-1.9 NumPy, if there are non-complex inputs, SciPy only calls numpy.fft.rfftn /irfftn from one thread at a time to be thread safe. In other cases, they use their own FFT method instead.\n\nHowever, though this workaround helped the users get out of trouble, it seemed a little complex. A developer proposed that the easiest workaround would be to convert the non-complex inputs to complex inputs (by adding 0j) so they were processed by SciPy\u2019s FFT routine instead of the buggy NumPy\u2019s RFFT method. This idea was disapproved by other developers. Because the NumPy\u2019s RFFT method is significantly faster, it is better to use this method whenever possible. Just as another SciPy developer commented, \u201cWhatever fix is done on the SciPy side, it would be nice if it didn\u2019t prevent someone who had a new enough (fixed) NumPy from using the newer RFFT method multithreaded.\u201d\n\n```\n@@ -109,5 +109,5 @@ def getSequenceNumber(self):\n 109 109 def getMSRecord(self):\n 110 110 # following from obspy.mseed.tests.test_libmseed\n 111 111 msr = clibmseed.msr_init(C.POINTER(MSRecord)())\n 112 112 pyobj = np.array(self.msrecord)\n 113 113 errcode = \\\n```\n\nFig. 3. The downstream workaround injected in Obspy\nFifteen out of the 60 workarounds were designed to restrict the use of the buggy upstream method to its covered cases. There are two key points in proposing a workaround of this kind. First, the developers should determine under what conditions the original used upstream method would fail, i.e., the uncovered cases. Usually, developers could find the answer during the process of diagnosing the bug. After that, it is important to decide how to deal with the failed cases. During inspecting the 11 workarounds, we find that the developers either made used of another method or just raised an error or an exception (e.g., sympy/sympy#9593).\n\n3) Pattern 3: Adapting the inputs to use original method\n\nTo avoid the failure caused by the uncovered cases, developers may also choose to convert their inputs into a processable form which can be correctly handled by the buggy upstream method.\n\nExample: Pyhrf/pyhrf#146 reported test failure which seemed to come from scipy.misc.fromimage. When trying to open 1-bit images, the SciPy method would produce a segmentation fault. In order to avoid the failure, the Pyhrf developers decided to first convert the 1-bit image into an 8-bit image which could be dealt with by the SciPy method. Fig. 5 shows the downstream workaround.\n\nNine out of the 60 studied workarounds conform to this pattern. Though it seems to be a direct way to convert an uncovered case to a covered case in order to use the original upstream routine, this method is not always feasible.\n\n4) Pattern 4: Converting the outputs of the original method\n\nTo work around the buggy upstream methods that produce wrong outputs with certain inputs, the downstream developers possibly choose to convert the wrong results to their desired ones.\n\nExample: The method combine_first in Pandas falsely converted of dates to integers (pandas-dev/pandas#3593). To bypass the bug, its downstream project Clair explicitly called pd.to_datetime to convert the time-related data from integers to dates (eike-welk/clair/#43). Fig. 6 shows the downstream workaround.\n\nApart from this example, two other downstream projects worked around cross-project bugs in this way.\nV. DISCUSSION\n\nIn this section, we discuss the findings about downstream workarounds.\n\nA. Workaround Generation\n\nMa et al. proposed that the workaround was the most common practice that the downstream developers used to cope with cross-project bugs [2]. Workarounds play a significant role since they can bypass the bad impact of bugs while waiting for upstream fixes, as well as shield the end user from being affected even when they use a buggy upstream version [2]. Therefore, when suffering a cross-project bug, it will be of great use if the downstream developers could propose a workaround timely.\n\nIn Section IV, we summarized the 60 cross-project bugs with workaround into three main categories. The largest number of bugs were new cases that the upstream method could not process. To temporarily handle the problem, the downstream developers may adopt another method with similar functionality instead, limit the use of the buggy method within the cases that it can handle, or convert the emerging case to the form that the buggy method can deal with. When facing the cross-project bugs which produce wrong results with certain inputs, the downstream developers may continue use the original method, but then explicitly transform the outputs into the correct form.\n\nSummarizing the bug types and common workaround patterns will be of help for developers to efficiently develop a suitable workaround. At the same time, it can also guide the design of (automatic) workaround generation tools. From the discussion in Section IV.C, the tool is supposed to do the following tasks. First, it can search for alternative methods which have the same functionality with the buggy method. Second, it can extract the conditions where the upstream methods do not correctly work. Third, it can adapt the input data to the suitable forms that the upstream methods are able to process.\n\nIn our opinions, a preferred workaround should follow three principles whether generated by hand or by tool. First, the workaround could suppress or bypass the upstream bug to make the downstream project run normally. Second, the workaround is supposed to make as few code changes as possible. Ma et al. indicated that the workarounds would be removed afterwards [2]. Therefore, the workaround is preferred to be designed in a way that does not affect other modules and make it easy to deprecate. Third, the workaround is supposed to use efficient methods in order not to reduce the performance of the project.\n\nB. Workaround Recommendation\n\nIn a software ecosystem, some central projects are used by multiple other projects. For example, in scientific Python ecosystem, NumPy is the basic tool and nearly all the projects within this ecosystem depend on it. Therefore, an error in a popular project like NumPy may break more than one downstream projects. All of them may need to work around the cross-project bug while waiting for an upstream fix. Under this circumstance, a downstream project could benefit from another responsive sibling project which has proposed a workaround for the same bug.\n\nDask/dask#297 shows an example. The project Dask was affected by a NumPy bug (numpy/numpy#3484). Then a developer found that another project Scikit-learn was suffering the same bug. After digging into the code of Scikit-learn, he indicated that Dask could learn from Scikit-learn. He commented, \u201cPossible solution would be to add a function for python 3 compatibility, as scikit-learn did: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/fixes.py#L8.\u201d Then, Dask copied the solution of Scikit-learn to their own code as their workaround for the bug.\n\nAn existing workaround in a sibling project reduces the workload of the developers suffering the same bug. However, to find a suitable workaround from another project seems to be a non-trivial task. First, the developers should find out what other projects are also affected by the cross-project bug. Then, they should get to know how these affected projects deal with the bug. Last, they have to select an appropriate workaround from these projects and adapt it to their own project. Therefore, a workaround recommendation tool which automates the process could be useful.\n\nThis tool should be designed to have at least three functionalities. First, it can predict what other projects may be influenced by the same bug and learnt the workaround from. Second, it can check for the code changes to extract downstream workarounds. Last, it can compare the context of the affected modules in different projects to rank the workarounds. The developers are facing several technical challenges to develop such a tool, which deserves a further study.\n\nC. Workaround Removing\n\nAs we have stated before, the downstream workaround is a temporary solution injected in the downstream projects to cope with a cross-project bug. Unlike the corresponding upstream fix which is an ultimate and permanent solution, the workaround may be modified or discarded later [2]. We indeed find some cases which shows that the developers intend to remove or change the workarounds in the future.\nMaterialsinnovation/pymks#132 reported that Pymks broke down due to a bug in Scikit-learn (scikit-learn/scikit-learn#3984). The downstream developer added key word argument size as a short term solution to the current dimension requirement for the buggy method from Scikit-learn. He then wrote in the commit, \u201cSklearn developers have already removed the dimension requirement on development version of the code. Once this version is released, this keyword argument should be removed.\u201d In pandas-dev/pandas#9276, the Pandas developer proposed a workaround for a NumPy bug (numpy/numpy #5562) with a comment that they would reconsider that decision once the upstream project fixed the bug. Sympy/sympy#9593 included a workaround for another NumPy bug (numpy/numpy#5672). The developer left a comment in the code that \u201cmaxargs is set by numpy compile-time constant NPY_MAXARGS. If a future version of numpy modifies or removes this restriction, this variable should be changed or removed.\u201d\n\nFrom these example, we see that the downstream developers could not decide the exact time to modify or remove the workarounds, because the time depends on when the responsible upstream projects accomplish certain tasks (e.g., releasing a new version or modify specific variables). Consequently, the downstream developers need to track the progress of their concerning upstream projects, in order to maintain their workarounds accordingly. It absolutely adds the burden of the downstream maintainers, which is confirmed by the respondents of the survey posed by Ma et al [2].\n\nIn order to reduce the maintenance burden of the downstream developers, an automatic workaround modification or removing tool is desirable. The tool is supposed to detect the occurrence of the upstream event which may influence the workaround and give a notification to the developers. Another key function of the tool is to (semi-)automatically remove the workarounds when the workarounds could be deprecated.\n\nAdditionally, the time to remove the workarounds is also worth studying. The workaround is a landmark case of the coordination between the upstream and downstream projects during the fixing process of cross-project bugs. To study the lifecycle of a workaround will help to understand how developers on both sides collaborate with each other to fix cross-project bugs and how developers from different projects cooperate within a software ecosystem.\n\nVI. THREATS TO VALIDITY\n\nIn this section, we discuss the threats to validity of our study.\n\nThe first threat concerns the accuracy of the identification of workarounds and fixes. Kim et al. pointed out that it needed high quality bug-fix information to reduce superficial conclusions, but many bug-fixes were polluted [28]. In order to identify the workarounds and fixes, two authors individually reviewed the issue reports and manually related commits indicated in the reports. They then cross-checked each other\u2019s results to maximize the accuracy of the data under investigation.\n\nThe second threat concerns the unknown effect of the deviation of the variables under statistical tests (the size of the workaround/fix) from the normal distribution. To mitigate these threats, our conclusions have been supported by proper statistical tests. We chose Wilcoxon signed-rank test and the Cliff\u2019s $\\delta$ effect size, because they are nonparametric tests which do not require any assumption on the underlying data distribution.\n\nThe third threat concerns the researchers\u2019 preconceptions. The two authors that conducted the manual analysis followed the same procedure and criteria in collecting the studied dataset, identifying and comparing fixes and workarounds, as well as summarizing bug features and workaround patterns. However, it is in general difficult to completely eliminate the influence of researchers\u2019 preconceptions. In order to minimize personal bias, they discuss the results, especially the unclear cases together.\n\nThe last threat concerns the generalization of our empirical results. We conducted our study on the scientific Python ecosystem. However, cross-project bugs and downstream workarounds do not only occur within the specific ecosystem. We cannot assume that our results generalize beyond the specific environment where they were conducted. Further validation on other ecosystems is desirable.\n\nVII. CONCLUSION AND FUTURE WORK\n\nIn previous work, proposing a workaround is shown to be a common practice for downstream developers to bypass the impact of a cross-project bug. In this study, we studied the characteristics the downstream workarounds. First, we manually identified 60 cross-project bugs which have a workaround from 271 cross-project bugs in scientific Python ecosystem. Then, with these data, we empirically compared the workaround with its corresponding upstream fix, summarized the bug features and workaround patterns. The main findings of this study is as follows:\n\n- In general, the size of the workaround is significantly smaller than that of the corresponding fix. The fix and the workaround usually have different code structures.\n- The cross-project bugs which the downstream developers worked around are usually caused by an emerging case that the upstream method cannot process, or by a wrong output with certain inputs, or Python 3 incompatibility.\n- Four patterns of workarounds are identified: using another method with similar functionality, restricting the buggy method to the range it can process, converting the inputs to a processable form, and correcting the outputs after using the buggy method.\n\nThe findings in this study also indicate the needs and possibility of developing tools supporting workaround generation, recommendation, maintenance and removal. In future work, we will continue to develop these supporting tools, as well as investigate the lifecycle of workarounds in more kinds of software ecosystems.\nACKNOWLEDGMENT\n\nThis work is supported by the National Natural Science Foundation of China (61472175, 61472178, 91418202) and the National Natural Science Foundation of Jiangsu Province (BK20130014).\n\nREFERENCES\n\n[1] E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian, \"An in-depth study of the promises and perils of mining GitHub\", Empirical Software Engineering, pp. 1\u201337, 2015.\n\n[2] W. Ma, L. Chen, X. Zhang, Y. Zhou, and B. Xu, \"How do developers fix cross-project correlated bugs? A case study on the GitHub scientific Python ecosystem\", in Proceedings of the 39th International Conference on Software Engineering, 2017, p. Accepted.\n\n[3] A. Decan, T. Mens, M. Claes, and P. Grosjean, \"When GitHub meets CRAN: an analysis of inter-repository package dependency problems\", in Proceedings of International Conference on Software Analysis, Evolution, and Reengineering, 2016, pp. 493\u2013504.\n\n[4] B. Adams, R. Kavanagh, A. E. Hassan, and D. M. German, \"An empirical study of integration activities in distributions of open source software\", Empirical Software Engineering, vol. 21, no. 3, pp. 960\u20131001, Jun. 2016.\n\n[5] G. Bavota, G. Canfora, M. Di Penta, R. Oliveto, and S. Panichella, \"How the Apache community upgrades dependencies: an evolutionary study\", Empirical Software Engineering, vol. 20, no. 5, pp. 1275\u20131317, Oct. 2015.\n\n[6] L. Villarroel, G. Bavota, B. Russo, R. Oliveto, and M. Di Penta, \"Release planning of mobile apps based on user reviews\", in Proceedings of the 38th International Conference on Software Engineering, 2016, pp. 14\u201324.\n\n[7] H. Valdivia Garcia and E. Shihab, \"Characterizing and predicting blocking bugs in open source projects\", in Proceedings of the 11th Working Conference on Mining Software Repositories, 2014, pp. 72\u201381.\n\n[8] X. Xia, D. Lo, E. Shihab, X. Wang, and X. Yang, \"ELBlocker: Predicting blocking bugs with ensemble imbalance learning\", Information and Software Technology, vol. 61, pp. 93\u2013106, May 2015.\n\n[9] H. Zhong and Z. Su, \"An empirical study on real bug fixes\", in Proceedings of the 37th International Conference on Software Engineering, 2015, vol. 1, pp. 913\u2013923.\n\n[10] K. Pan, S. Kim, and E. J. Whitehead, \"Toward an understanding of bug fix patterns\", Empirical Software Engineering, vol. 14, no. 3, pp. 286\u2013315, Jun. 2009.\n\n[11] J. Park, M. Kim, and D.-H. Bae, \"An empirical study of supplementary patches in open source projects\", Empirical Software Engineering, vol. 22, no. 1, pp. 436\u2013473, May 2016.\n\n[12] Y. Jiang, B. Adams, and D. M. German, \"Will my patch make it? and how fast?: case study on the Linux kernel\", in Proceedings of the 10th Working Conference on Mining Software Repositories, 2013, pp. 101\u2013110.\n\n[13] A. T. Misirli, E. Shihab, and Y. Kamei, \"Studying high impact fix-inducing changes\", Empirical Software Engineering, vol. 21, no. 2, pp. 605\u2013641, Apr. 2016.\n\n[14] J. Echeverria, F. Perez, A. Abellanas, J. I. Panach, C. Cetina, and O. Pastor, \"Evaluating bug-fixing in Software Product Lines: an industrial case study\", in Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 2016, pp. 1\u20136.\n\n[15] C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer, \"GenProg: a generic method for automatic software repair\", IEEE Transactions on Software Engineering, vol. 38, no. 1, pp. 54\u201372, Jan. 2012.\n\n[16] C. Le Goues, M. Dewey-Vogt, S. Forrest, and W. Weimer, \"A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each\", in Proceedings of the 34th International Conference on Software Engineering, 2012, pp. 3\u201313.\n\n[17] S. Mechtaev, J. Yi, and A. Roychoudhury, \"Angelix: scalable multiline program patch synthesis via symbolic analysis\", in Proceedings of the 38th International Conference on Software Engineering, 2016, pp. 691\u2013701.\n\n[18] Z. Gu, E. T. Barr, D. J. Hamilton, and Z. Su, \"Has the bug really been fixed?\", in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, 2010, vol. 1, p. 55.\n\n[19] M. Leszak, D. E. Perry, and D. Stoll, \"A case study in root cause defect analysis\", in Proceedings of the 22nd international conference on Software engineering, 2000, pp. 428\u2013437.\n\n[20] \"Workaround - Project Management Knowledge\". [Online]. Available: https://project-management-knowledge.com/definitions/w/workaround/. [Accessed: 08-Apr-2017].\n\n[21] E. Murphy-Hill, T. Zimmermann, C. Bird, and N. Nagappan, \"The design of bug fixes\", in Proceedings of 35th International Conference on Software Engineering, 2013, pp. 332\u2013341.\n\n[22] A. J. Ko, R. DeLine, and G. Venolia, \"Information needs in collocated software development Teams\", in Proceedings of the 29th International Conference on Software Engineering, 2007, pp. 344\u2013353.\n\n[23] E. Berglund, \"Communicating bugs: global bug knowledge distribution\", Information and Software Technology, vol. 47, no. 11, pp. 709\u2013719, 2005.\n\n[24] J. D. Gibbons and D. A. Wolfe, Nonparametric Statistical Inference. 2003.\n\n[25] E. a. Freeman and G. G. Moisen, \"A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and kappa\", Ecological Modelling, vol. 217, no. 1\u20132, pp. 48\u201358, 2008.\n\n[26] G. MacBeth, E. Razumiejczyk, and R. Ledsema, \"Cliff\u2019s Delta calculator: a non-parametric effect size program for two groups of observations\", Universitas Psychologica, vol. 10, no. 2, pp. 545\u2013555, 2012.\n\n[27] Y. Yang, Y. Zhou, H. Lu, L. Chen, Z. Chen, and B. Xu, \"Are slice-based cohesion metrics actually useful in effort-aware post-release fault-proneness prediction? An empirical study\", IEEE Transactions on Software Engineering, vol. 41, no. 4, pp. 331\u2013357, 2015.\n\n[28] S. Kim, H. Zhang, R. Wu, and L. Gong, \"Dealing with noise in defect prediction\", in Proceedings of the 33rd International Conference on Software Engineering, 2011, pp. 481\u2013490.", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/019_ding.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 10, "total-input-tokens": 35742, "total-output-tokens": 11997, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 5187, 1], [5187, 11269, 2], [11269, 17039, 3], [17039, 21620, 4], [21620, 26036, 5], [26036, 31112, 6], [31112, 33272, 7], [33272, 38395, 8], [38395, 44298, 9], [44298, 50151, 10]]}}
|
|
{"id": "a1408b6224c6e6c9787abc43b9bf1c7f5d16adbf", "text": "Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects\n\nMichael Hilton \nOregon State University, USA \nhiltonm@eecs.oregonstate.edu\n\nTimothy Tunnell \nUniversity of Illinois, USA \ntunnell2@illinois.edu\n\nKai Huang \nUniversity of Illinois, USA \nkhuang29@illinois.edu\n\nDarko Marinov \nUniversity of Illinois, USA \nmarinov@illinois.edu\n\nDanny Dig \nOregon State University, USA \ndigd@eecs.oregonstate.edu\n\nABSTRACT\nContinuous integration (CI) systems automate the compilation, building, and testing of software. Despite CI rising as a big success story in automated software engineering, it has received almost no attention from the research community. For example, how widely is CI used in practice, and what are some costs and benefits associated with CI? Without answering such questions, developers, tool builders, and researchers make decisions based on folklore instead of data.\n\nIn this paper, we use three complementary methods to study the usage of CI in open-source projects. To understand which CI systems developers use, we analyzed 34,544 open-source projects from GitHub. To understand how developers use CI, we analyzed 1,529,291 builds from the most commonly used CI system. To understand why projects use or do not use CI, we surveyed 442 developers. With this data, we answered several key questions related to the usage, costs, and benefits of CI. Among our results, we show evidence that supports the claim that CI helps projects release more often, that CI is widely adopted by the most popular projects, as well as finding that the overall percentage of projects using CI continues to grow, making it important and timely to focus more research on CI.\n\nCCS Concepts\n\u2022 Software and its engineering \u2192 Agile software development; Software testing and debugging;\n\nKeywords\ncontinuous integration; mining software repositories\n\n1. INTRODUCTION\nContinuous Integration (CI) is emerging as one of the biggest success stories in automated software engineering. CI systems automate the compilation, building, testing and deployment of software. For example, such automation has been reported [22] to help Flickr deploy to production more than 10 times per day. Others [40] claim that by adopting CI and a more agile planning process, a product group at HP reduced development costs by 78%.\n\nThese success stories have led to CI growing in interest and popularity. Travis CI [17], a popular CI service, reports that over 300,000 projects are using Travis. The State of Agile industry survey [48], with 3,880 participants, found 50% of respondents use CI. The State of DevOps report [49] finds CI to be one of the indicators of \"high performing IT organizations\". Google Trends [11] shows a steady increase of interest in CI: searches for \u201cContinuous Integration\u201d increased 350% in the last decade.\n\nDespite the growth of CI, the only published research paper related to CI usage [53] is a preliminary study, conducted on 246 projects, which compares several quality metrics of projects that use or do not use CI. However, the study does not present any detailed information on how projects use CI. In fact, despite some folkloric evidence about the use of CI, there is no systematic study about CI systems.\n\nNot only do we lack basic knowledge about the extent to which open-source projects are adopting CI, but also we have no answers to many important questions related to CI. What are the costs of CI? Does CI deliver on the promised benefits, such as releasing more often, or helping make changes (e.g., to merge pull requests) faster? Do developers maximize the usage of CI? Despite the widespread popularity of CI, we have very little quantitative evidence on its benefits. This lack of knowledge can lead to poor decision making and missed opportunities. Developers who choose not to use CI can be missing out on the benefits of CI. Developers who do choose to use CI might not be using it to its fullest potential. Without knowledge of how CI is being used, tool builders can be misallocating resources instead of having data about where automation and improvements are most needed by their users. By not studying CI, researchers have a blind spot which prevents them from providing solutions to the hard problems that practitioners face.\n\nIn this paper we use three complementary methods to study the usage of CI in open-source projects. To understand the extent to which CI has been adopted by developers, and which CI systems developers use, we analyzed 34,544 open-source projects from GitHub. To understand how developers use CI, we analyzed 1,529,291 builds from Travis CI, the most commonly used CI service for GitHub projects (Section 4.1).\nTo understand why projects use or do not use CI, we surveyed 442 developers.\n\nWith this data, we answer several research questions that we grouped into three themes:\n\n**Theme 1: Usage of CI**\n\n**RQ1:** What percentage of open-source projects use CI?\n\n**RQ2:** What is the breakdown of usage of different CI services?\n\n**RQ3:** Do certain types of projects use CI more than others?\n\n**RQ4:** When did open-source projects adopt CI?\n\n**RQ5:** Do developers plan on continuing to use CI?\n\nWe found that CI is widely used, and the number of projects which are adopting CI is growing. We also found that the most popular projects are most likely to use CI.\n\n**Theme 2: Costs of CI**\n\n**RQ6:** Why do open-source projects choose not to use CI?\n\n**RQ7:** How often do projects evolve their CI configuration?\n\n**RQ8:** What are some common reasons projects evolve their CI configuration?\n\n**RQ9:** How long do CI builds take on average?\n\nWe found that the most common reason why developers are not using CI is lack of familiarity with CI. We also found that the average project makes only 12 changes to their CI configuration file and that many such changes can be automated.\n\n**Theme 3: Benefits of CI**\n\n**RQ10:** Why do open-source projects choose to use CI?\n\n**RQ11:** Do projects with CI release more often?\n\n**RQ12:** Do projects which use CI accept more pull requests?\n\n**RQ13:** Do pull requests with CI builds get accepted faster (in terms of calendar time)?\n\n**RQ14:** Do CI builds fail less on master than on other non-master branches?\n\nWe first surveyed developers about the perceived benefits of CI, then we empirically evaluated these claims. We found that projects that use CI release twice as often as those that do not use CI. We also found that projects with CI accept pull requests faster than projects without CI.\n\nThis paper makes the following contributions:\n\n1. **Research Questions:** We designed 14 novel research questions. We are the first to provide in-depth answers to questions about the usage, costs, and benefits of CI.\n\n2. **Data Analysis:** We collected and analyzed CI usage data from 34,544 open-source projects. Then we analyzed in-depth all CI data from a subset of 620 projects and their 1,529,291 builds, 1,503,092 commits, and 653,404 pull requests. Moreover, we surveyed 442 open-source developers about why they chose to use or not use CI.\n\n3. **Implications:** We provide practical implications of our findings from the perspective of three audiences: researchers, developers, and tool builders. Researchers should pay attention to CI because it is not a passing fad. For developers we list several situations where CI provides the most value. Moreover, we discovered several opportunities where automation can be helpful for tool builders.\n\nMore details about our data sets and results are available at http://cope.eecs.oregonstate.edu/CISurvey\n\n## 2. OVERVIEW OF CI\n\n### 2.1 History and Definition of CI\n\nThe idea of Continuous Integration (CI) was first introduced in 1991 by Grady Booch [26], in the context of object-oriented design: \u201cAt regular intervals, the process of continuous integration yields executable releases that grow in functionality at every release...\u201d This idea was then adopted as one of the core practices of Extreme Programming (XP) [23].\n\nHowever, the idea began to gain acceptance after a blog post by Martin Fowler [37] in 2000. The motivating idea of CI is that the more often a project can integrate, the better off it is. The key to making this possible, according to Fowler, is automation. Automating the build process should include retrieving the sources, compiling, linking, and running automated tests. The system should then give a \u201cyes\u201d or \u201cno\u201d indicator of whether the build was successful. This automated build process can be triggered either manually or automatically by other actions from the developers, such as checking in new code into version control.\n\nThese ideas were implemented by Fowler in CruiseControl [9], the first CI system, which was released in 2001. Today there are over 40 different CI systems, and some of the most well-known ones include Jenkins [12] (previously called Hudson), Travis CI [17], and Microsoft Team Foundation Server (TFS) [15]. Early CI systems usually ran locally, and this is still widely done for Jenkins and TFS. However, CI as a service has become more and more popular, e.g., Travis CI is only available as a service, and even Jenkins is offered as a service via the CloudBees platform [6].\n\n### 2.2 Example Usage of CI\n\nWe now present an example of CI that comes from our data. The pull request we are using can be found here: https://github.com/RestKit/RestKit/pull/2370. A developer named \u201cAdlai-Holler\u201d created pull request #2370 named \u201cAvoid Flushing In-Memory Managed Object Cache while Accessing\u201d to work around an issue titled \u201cDuplicate objects created if inserting relationship mapping using RKInMemoryManagedObjectCache\u201d for the project RestKit [13]. The developer made two commits and then created a pull request, which triggered a Travis CI build. The build failed, because of failing unit tests. A RestKit project member, \u201csegiddins\u201d, then commented on the pull request, and asked Adlai-Holler to look into the test failures. Adlai-Holler then committed two new changes to the same pull request. Each of these commits triggered a new CI build. The first build failed, but the second was successful. Once the CI build passed, the RestKit team member commented \u201cseems fine\u201d and merged the pull request.\n\n## 3. METHODOLOGY\n\nTo understand the extent to which CI is used and which CI systems developers use, we analyzed 34,544 open-source projects from GitHub with our breadth corpus. To understand how developers use CI, we analyzed 1,529,291 builds on the most popular CI system in our depth corpus. To understand why projects use or do not use CI, we surveyed 442 developers.\n3.1 Breadth Corpus\n\nThe breadth corpus has a large number of projects, and information about what CI services each project uses. We use the breadth corpus to answer broad questions about the usage of CI in open-source projects. We collected the data for this corpus primarily via the GitHub API. We first sorted GitHub projects by their popularity, using the star rating (whereby users can mark, or \u201cstar\u201d, some projects that they like, and hence each project can accumulate stars). We started our inspection from the top of the list, first by manually looking at the top 50 projects. We collected all publicly available information about how these projects use CI. We then used what we learned from this manual inspection to write a script to programmatically classify which CI service (if any) a project uses. The four CI services that we were able to readily identify manually and later by our script are (sorted in the order of their usage): Travis CI [17], CircleCI [5], AppVeyor [2], and Werker [18]. All of these services provide public API\u2019s which we queried to determine if a project is using that service.\n\nMoreover, we wanted to ensure that we had collected as complete data as possible. When we examined the data by hand, we found that several projects were using CloudBees [6], a CI service powered by the Jenkins CI. However, given a list of GitHub projects, there is no reliable way to programmatically identify from the GitHub API which projects use CloudBees. (In contrast, Travis CI uses the same organization and project names as GitHub, making it easy to check correspondence between Travis CI and GitHub projects.) We contacted CloudBees, and they sent us a list of open-source projects that have CloudBees build set up. We then wrote a script to parse that list, inspect the build information, and search for the corresponding GitHub repository (or repositories) for each build on CloudBees. We then used this data to identify the projects from our breadth corpus that use CloudBees. This yielded 1,018 unique GitHub repositories/projects. To check whether these projects refer to CloudBees, we searched for (case insensitive) \u201cCloudBees\u201d in the README files of these projects and found that only 256 of them contain \u201cCloudBees\u201d. In other words, had we not contacted CloudBees directly, using only the information available on GitHub, we would have missed a large number of projects that use CloudBees.\n\nOverall, the breadth corpus consists of 34,544 projects. For each project, we collected the following information: project name and owner, the CI system(s) that the project uses (if any), popularity (as measured by the number of stars), and primary programming language (as determined by GitHub).\n\n3.2 Depth Corpus\n\nThe depth corpus has fewer projects, but for each project we collect all the information that is publicly available. For this subset of projects, we collected additional data to gain a deeper understanding of the usage, costs, and benefits of CI. Analyzing our breadth corpus, as discussed in Section 4.1, we learned that Travis CI is by far the most commonly used CI service among open-source projects. Therefore, we targeted projects using Travis CI for our depth corpus. First, we collected the top 1,000 projects from GitHub ordered by their popularity. Of those 1,000 projects, we identified 620 projects that use Travis CI, 37 use AppVeyor, 166 use CircleCI, and 3 use Werker. We used the Travis CI API\\(^1\\) to collect the entire build history for each project in our depth corpus, for a total of 1,529,291 builds. Using GHTorrent [39], we collected the full history of pull requests for each project, for a total of 653,404 pull requests. Additionally, we cloned every project in our corpus, to access the entire commit history and source code.\n\n3.3 Survey\n\nEven after collecting our diverse breadth and depth corpora, we were still left with questions that we could not answer from the online data alone. These questions were about why developers chose to use or not use CI. We designed a survey to help us answer a number of such \u201cwhy\u201d questions, as well as to provide us another data source to better understand CI usage. We deployed our survey by sending it to all the email addresses publicly listed as belonging to the organizations of all the top 1,000 GitHub projects (again rated by the popularity). In total, we sent 4,508 emails.\n\nOur survey consisted of two flows, each with three questions. The first question in both flows asked if the participant used CI or not. Depending on the answer they gave to this question, the second question asked the reasons why they use or do not use CI. These questions were multiple-choice, multiple-selection questions where the users were asked to select all the reasons that they agreed with. To populate the choices, we collected some common reasons for using or not using CI, as mentioned in websites [1,7], blogs [3,8,19], and Stack Overflow [14]. Optionally, the survey participants could also write their own reason(s) that we did not already list. The third question asked if the participant plans on using CI for future projects.\n\nTo incentivize participation, we raffled off a 50 USD gift card among the survey respondents. 442 (9.8% response rate) participants responded to our survey. Of those responses, 407 (92.1%) indicated that they do use CI, and 35 (7.9%) indicated that they do not use CI.\n\n4. RESULTS\n\nIn this section, we present the results to our research questions. Section 4.1 presents the results about the usage of CI. Section 4.2 discusses the costs of CI. Finally Section 4.3 presents the benefits of CI. Rather than presenting implications after each research question, we draw from several research questions to triangulate implications that we present in Section 5.\n\n4.1 Usage of CI\n\nTo determine the extent to which CI is used, we study what percentage of projects actively use CI, and we also ask developers if they plan to use CI in the future. Furthermore, we study whether the project popularity and programming language correlate with the usage of CI.\n\n**RQ1: What percentage of open-source projects use CI?**\n\nWe found that 40% of all the projects in our breadth corpus use CI. Table 1 shows the breakdown of the usage. Thus, CI is indeed used widely and warrants further investigation.\n\n\\(^1\\)We are grateful to the Travis CI developers for promptly resolving a bug report that we submitted; prior to them resolving this bug report, one could not query the full build history of all projects.\nAdditionally, we know that our scripts do not find all CI usage (e.g., projects that run privately hosted CI systems, as discussed further in Section 6.2). We can reliably detect the use of (public) CI services only if their API makes it possible to query the CI service based on knowing the GitHub organization and project name. Therefore, the results we present are a lower bound on the total number of projects that use CI.\n\nTable 2: CI usage by Service. The top row shows percent of all CI projects using that service, the second row shows the total number of projects for each service. Percents add up to more than 100 due to some projects using multiple CI services.\n\n| Usage by CI Service | Travis | CircleCI | AppVeyor | CloudBees | Werker |\n|---------------------|--------|----------|----------|-----------|--------|\n| 90.1% | 19.1% | 3.5% | 1.6% | 0.4% | |\n| 12528 | 2657 | 484 | 223 | 59 | |\n\nRQ2: What is the breakdown of usage of different CI services?\n\nNext we investigate which CI services are the most widely used in our breadth corpus. Table 2 shows that Travis CI is by far the most widely used CI service. Because of this result, we feel confident that our further analysis can focus on the projects that use Travis CI as a CI service, and that analyzing such projects gives representative results for usage of CI services in open-source projects.\n\nWe also found that some projects use more than one CI service. In our breadth corpus, of all the projects that use CI, 14% use more than one CI. We think this is an interesting result which deserves future attention.\n\nRQ3: Do certain types of projects use CI more than others?\n\nTo better understand which projects use CI, we look for characteristics of projects that are more likely to use CI.\n\nCI usage by project popularity: We want to determine whether more popular projects are more likely to use CI. Our intuition is that if CI leads to better outcomes, then we would expect to see higher usage of CI among the most popular projects (or, alternatively, that projects using CI get better and thus are more popular). Figure 1 shows that the most popular projects (as measured by the number of stars) are also the most likely to use CI (Kendall\u2019s $\\tau$, $p < 0.00001$).\n\nWe group the projects from our breadth corpus into 64 even groups, ordered by number of stars. We then calculate the percent of projects in each group that are using CI. Each group has around 540 projects. In the most popular (starred) group, 70% of projects use CI. As the projects become less popular, the percentage of projects using CI declines to 23%.\n\nObservation\n\nPopular projects are more likely to use CI.\n\nCI usage by language: We now examine CI usage by programming language. Are there certain languages for which the projects written primarily in such languages use CI more than others? Table 3 shows projects sorted by the percentage of projects that use CI for each language, from our breadth corpus. The data shows that in fact there are certain languages that use CI more than others. Notice that the usage of CI does not perfectly correlate with the number of projects using that language (as measured by the number of projects using a language, with its rank by percentage, Kendall\u2019s $\\tau$, $p > 0.68$). In other words, some of the languages that use CI the most are both popular languages like Ruby and emerging languages like Scala. Similarly, among projects that use CI less, we notice both popular languages such as Objective-C and Java, as well as less popular languages such as VimL.\n\nHowever, we did observe that many of the languages that have the highest CI usage are also dynamically-typed languages (e.g., Ruby, PHP, CoffeeScript, Clojure, Python, and JavaScript). One possible explanation may be that in the absence of a static type system which can catch errors early on, these languages use CI to provide extra safety.\n\nObservation\n\nWe observe a wide range of projects that use CI. The popularity of the language does not correlate with the probability that a project uses CI.\n\nRQ4: When did open-source projects adopt CI?\n\nWe next study when projects began to adopt CI. Figure 2 shows the number of projects using CI over time. We answer this question with our depth corpus, because the breadth corpus does not have the date of the first build, which we use to determine when CI was introduced to the project. Notice that we are collecting data from Travis CI, which was founded in 2011 [10]. Figure 2 shows that CI has experienced a steady growth over the last 5 years.\n\nWe also analyze the age of each project when developers first introduced CI, and we found that the median time was around 1 year. Based on this data, we conjecture that while many developers introduce CI early in a project\u2019s\nTable 3: CI usage by programming language. For each language, the columns tabulate: the number of projects from our corpus that predominantly use that language, how many of these projects use CI, the percentage of projects that use CI.\n\n| Language | Total Projects | # Using CI | Percent CI |\n|------------|----------------|------------|------------|\n| Scala | 329 | 221 | 67.17 |\n| Ruby | 2721 | 1758 | 64.61 |\n| Go | 1159 | 702 | 60.57 |\n| PHP | 1806 | 982 | 54.37 |\n| CoffeeScript | 343 | 176 | 51.31 |\n| Clojure | 323 | 152 | 47.06 |\n| Python | 3113 | 1438 | 46.19 |\n| Emacs Lisp | 150 | 67 | 44.67 |\n| JavaScript | 8495 | 3692 | 43.46 |\n| Other | 1710 | 714 | 41.75 |\n| C++ | 1233 | 483 | 39.17 |\n| Swift | 723 | 273 | 37.76 |\n| Java | 3371 | 1188 | 35.24 |\n| C | 1321 | 440 | 33.31 |\n| C# | 652 | 188 | 28.83 |\n| Perl | 140 | 38 | 27.14 |\n| Shell | 709 | 185 | 26.09 |\n| HTML | 948 | 241 | 25.42 |\n| CSS | 937 | 194 | 20.70 |\n| Objective-C| 2745 | 561 | 20.44 |\n| VimL | 314 | 59 | 18.79 |\n\ndevelopment lifetime, it is not always seen as something that provides a large amount of value during the very initial development of a project.\n\n**Observation**\n\nThe median time for CI adoption is one year.\n\n**RQ5:** Do developers plan on continuing to use CI? Is CI a passing \u201cfad\u201d in which developers will lose interest, or will it be a lasting practice? While only time will tell what the true answer is, to get some sense of what the future could hold, we asked developers in our survey if they plan to use CI for their next project. We asked them how likely they were to use CI on their next project, using a 5-point Likert scale ranging from definitely will use to definitely will not use. Figure 3 shows that developers feel very strongly that they will be using CI for their next project. The top two options, \u2018Definitely\u2019 and \u2018Most Likely\u2019, account for 94% of all our survey respondents, and the average of all the answers was 4.54. While this seems like a pretty resounding endorsement for the continued use of CI, we decided to dig a little deeper. Even among respondents who are not currently using CI, 53% said that they would \u2018Definitely\u2019 or \u2018Most Likely\u2019 use CI for their next project.\n\n**Observation**\n\nWhile CI is widely used in practice nowadays, we predict that in the future, CI adoption rates will increase even further.\n\n### 4.2 Costs of CI\n\nTo better understand the costs of CI, we analyze both the survey (where we asked developers why they believe CI is too costly to be worth using) and the data from our depth corpus. We estimate the cost to developers for writing and maintaining the configuration for their CI service. Specifically, we measure how often the developers make changes to their configuration files and study why they make those changes to the configuration files. We also analyze the cost in terms of the time to run CI builds. Note that the time that the builds take to return a result could be unproductive time if the developers do not know how to proceed without knowing that result.\n\n**RQ6:** Why do open-source projects choose not to use CI?\n\nOne way to evaluate the costs of CI is to ask developers why they do not use CI. In our survey, we asked respondents whether they chose to use or not use CI, and if they indicated that they did not, then we asked them to tell us why they do not use CI.\n\nTable 4 shows the percentage of the respondents who selected particular reasons for not using CI. As mentioned before, we built the list of possible reasons by collecting information from various popular internet sources. Interestingly, the primary cost that respondents identified was not a technical cost; instead, the reason for not using CI was that \u201cThe developers on my project are not familiar enough with CI.\u201d We do not know if the developers are not familiar enough with the CI tools themselves (e.g., Travis CI), or if they are unfamiliar with all the work it will take to add CI to their project, including perhaps fully automating the build. To completely answer this question, more research is needed.\n\nThe second most selected reason was that the project does not have automated tests. This speaks to a real cost for CI, in\nTable 4: Reasons developers gave for not using CI\n\n| Reason | Percent |\n|------------------------------------------------------------------------|---------|\n| The developers on my project are not familiar enough with CI | 47.00 |\n| Our project doesn\u2019t have automated tests | 44.12 |\n| Our project doesn\u2019t commit often enough for CI to be worth it | 35.29 |\n| Our project doesn\u2019t currently use CI, but we would like to in the future | 26.47 |\n| CI systems have too high maintenance costs (e.g., time, effort, etc.) | 20.59 |\n| CI takes too long to set up | 17.65 |\n| CI doesn\u2019t bring value because our project already does enough testing| 5.88 |\n\nFigure 4: Number of changes to CI configs, median number of changes is 12\n\nthat much of its value comes from automated tests, and some projects find that developing good automated test suites is a substantial cost. Even in the cases where developers had automated tests, some questioned the use of CI (in particular and regression testing in general); one respondent (P74) even said \u201cIn 4 years our tests have yet to catch a single bug.\u201d\n\nObservation\n\nThe main reason why open-source projects choose to not use CI is that the developers are not familiar enough with CI.\n\nRQ7: How often do projects evolve their CI configuration?\n\nWe ask this question to identify how often developers evolve their CI configurations. Is it a \u201cwrite-once-and-forget-it\u201d situation, or is it something that evolves constantly? The Travis CI service is configured via a YAML [20] file, named .travis.yml, in the project\u2019s root directory. YAML is a human-friendly data serialization standard. To determine how often a project has changed its configuration, we analyzed the history of every .travis.yml file and counted how many times it has changed. We calculate the number of changes from the commits in our depth corpus. Figure 4 shows the number of changes/commits to the .travis.yml file over the life of the project. We observe that the median of number of changes to a project\u2019s CI configuration is 12 times, but one of the projects changed the CI configuration 266 times. This leads us to conclude that many projects setup CI once and then have minimal involvement (25% of projects have 5 or less changes to their CI configuration), but some projects do find themselves changing their CI setup quite often.\n\nObservation\n\nSome projects change their configurations relatively often, so it is worthwhile to study what these changes are.\n\nTable 5: Reasons for CI config changes\n\n| Config Area | Total Edits | Percentage |\n|---------------------------|-------------|------------|\n| Build Matrix | 9718 | 14.70 |\n| Before Install | 8549 | 12.93 |\n| Build Script | 8328 | 12.59 |\n| Build Language Config | 7222 | 10.92 |\n| Build Env | 6900 | 10.43 |\n| Before Build Script | 6387 | 9.66 |\n| Install | 4357 | 6.59 |\n| Whitespace | 3226 | 4.88 |\n| Build platform Config | 3058 | 4.62 |\n| Notifications | 2069 | 3.13 |\n| Comments | 2004 | 3.03 |\n| Git Configuration | 1275 | 1.93 |\n| Deploy Targets | 1079 | 1.63 |\n| After Build Success | 1025 | 1.55 |\n| After Build Script | 602 | 0.91 |\n| Before Deploy | 133 | 0.20 |\n| After Deploy | 79 | 0.12 |\n| Custom Scripting | 40 | 0.06 |\n| After Build Failure | 39 | 0.06 |\n| After Install | 14 | 0.02 |\n| Before Install | 10 | 0.02 |\n| Mysql | 5 | 0.01 |\n| After Build Success | 3 | 0.00 |\n| Allow Failures | 2 | 0.00 |\n\nRQ8: What are some common reasons projects evolve their CI configuration?\n\nTo better understand the changes to the CI configuration files, we analyzed all the changes that were made to the .travis.yml files in our depth corpus. Because YAML is a structured language, we can parse the file and determine which part of the configuration was changed. Table 5 shows the distribution of all the changes. The most common changes were to the build matrix, which in Travis specifies a combination of runtime, environment, and exclusions/inclusions. For example, a build matrix for a project in Ruby could specify the runtimes rvm 2.2, rvm 1.9, and jruby, the build environment rails2 and rails3, and the exclusions/inclusions, e.g., exclude: jruby with rails2. All combinations will be built except those excluded, so in this example there would be 5 different builds. Other common changes included the dependent libraries to install before building the project (what .travis.yml calls before install) and changes to the build script themselves. Also, many other changes were due to the version changes of dependencies.\n**RQ9: How long do CI builds take on average?**\n\nAnother cost of using CI is the time to build the application and run all the tests. This cost represents both a cost of energy\\(^2\\) for the computing power to run these builds, but also developers may have to wait to see if their build passes before they merge in the changes, so having longer build times means more wasted developer time.\n\nThe average build time is just under 500 seconds. To compute the average build times, we first remove all the canceled (incomplete, manually stopped) build results, and only consider the time for errored, failed, and passed (completed builds). Errored builds are those that occur before the build begins (e.g., when a dependency cannot be downloaded), and failed builds are those that the build is not completed successfully (e.g., a unit test fail). To further understand the data, we look at each outcome independently. Interestingly, we find that passing builds run faster than either errored or failed builds. The difference between errored and failed is significant (Wilcoxon, \\(p < 0.0001\\)), as is the difference between passed and errored (Wilcoxon, \\(p < 0.0001\\)) and the difference between passed and failed (Wilcoxon, \\(p < 0.0001\\)).\n\nWe find this result surprising as our intuition is that passing builds should take longer, because if an error state is encountered early on, the process can abort and return earlier. Perhaps it is the case that many of the faster running pass builds are not generating a meaningful result, and should not have been run. However, more investigation is needed to determine what the exact reasons for this are.\n\n\\(^2\\)This cost should not be underestimated; our personal correspondence with a Google manager in charge of their CI system TAP reveals that TAP costs millions of dollars just for the computation (not counting the cost of developers who maintain or use TAP).\n\n**4.3 Benefits of CI**\n\nWe first summarize the most commonly touted benefits of CI, as reported by the survey participants. We then analyze empirically whether these benefits are quantifiable in our depth corpus. Thus, we confirm or refute previously held beliefs about the benefits of CI.\n\n**RQ10: Why do open-source projects choose to use CI?**\n\nHaving found that CI is widely used in open-source projects (RQ1), and that CI is most widely used among the most popular projects on GitHub (RQ3), we want to understand why developers choose to use CI. However, why a project uses CI cannot be determined from a code repository. Thus, we answer this question using our survey data.\n\nTable 6 shows the percentage of the respondents who selected particular reasons for using CI. As mentioned before, we build this list of reasons by collecting information from various popular internet sources. The two most popular reasons were \u201cCI makes us less worried about breaking our builds\u201d and \u201cCI helps us catch bugs earlier\u201d. One respondent (P371) added: \u201cActs like a watchdog. You may not run tests, or be careful with merges, but the CI will. :)\u201d\n\nMartin Fowler [7] is quoted as saying \u201cContinuous Integration doesn\u2019t get rid of bugs, but it does make them dramatically easier to find and remove.\u201d However, in our survey, very few projects felt that CI actually helped them during the debugging process.\n\n**RQ11: Do projects with CI release more often?**\n\nOne of the more common claims about CI is that it helps projects release more often, e.g., CloudBees motto is \u201cDeliver Software Faster\u201d [6]. Over 50% of the respondents from our survey claimed it was a reason why they use CI. We analyze our data to see if we can indeed find evidence that would support this claim.\n\nWe found that projects that use CI do indeed release more often than either (1) the same projects before they used CI or (2) the projects that do not use CI. In order to compare across projects and periods, we calculated the release rate as the number of releases per month. Projects that use CI average .54 releases per month, while projects that do not use CI average .24 releases per month. That is more than double the release rate, and the difference is statistically significant (Wilcoxon, \\(p < 0.00001\\)). To identify the effect of CI, we also compared, for projects that use CI, the release rate both before and after the first CI build. We found that projects that eventually added CI used to release at a rate of .34 releases per month, well below the .54 rate at which they release now with CI. This difference is statistically significant (Wilcoxon, \\(p < 0.00001\\)).\n\n**RQ12: Do projects which use CI accept more pull requests?**\n\nFor a project that uses a CI service such as Travis CI, when the CI server builds a pull request, it annotates the pull request on GitHub with a visual cue such as a green check mark or a red \u2018X\u2019 that shows whether the pull request was able to build successfully on the CI server. Our intuition is that this extra information can help developers better decide whether or not to merge a pull request into their code. To determine if this extra information indeed makes a difference, we compared the pull request acceptance rates between pull\nTable 6: Reasons for using CI, as reported by survey participants\n\n| Reason | Percent |\n|------------------------------------------------------------------------|---------|\n| CI makes us less worried about breaking our builds | 87.71 |\n| CI helps us catch bugs earlier | 79.61 |\n| CI allows running our tests in the cloud, freeing up our personal machines | 54.55 |\n| CI helps us deploy more often | 53.32 |\n| CI makes integration easier | 53.07 |\n| CI runs our tests in a real-world staging environment | 46.00 |\n| CI lets us spend less time debugging | 33.66 |\n\nTable 7: Release rate of projects\n\n| Uses Travis | Versions Released per Month |\n|-------------|-----------------------------|\n| Yes | .54 |\n| No | .24 |\n\nTable 8: Comparison of pull requests merged for pull requests that had or did not have CI information\n\n| CI Usage | % Pull Requests Merged |\n|----------|------------------------|\n| Using CI | 23 |\n| Not Using CI | 28 |\n\nrequests that have this CI information and pull requests that do not have it, from the depth corpus. Note that projects can exclude some branches from their repository to not run on the CI server, so just because a project uses CI on some branch, there is no guarantee that every pull request contains the CI build status information.\n\nTable 8 shows the results for this question. We found that pull requests without CI information were 5pp more likely to be merged than pull requests with CI information. Our intuition of this result is that those 5pp of pull requests have problems which are identified by the CI. By not merging these pull requests, developers can avoid breaking the build. This difference is statistically significant (Fisher\u2019s Exact Test: \\( p < 0.00001 \\)). This also fits with our survey result that developers say that using CI makes them less worried about breaking the build. One respondent (P219) added that CI \u201cPrevents contributors from releasing breaking builds\u201d. By not merging in potential problem pull requests, developers can avoid breaking their builds.\n\n**Observation**\n\nCI build status can help developers avoid breaking the build by not merging problematic pull requests into their projects.\n\n**RQ13:** Do pull requests with CI builds get accepted faster (in terms of calendar time)?\n\nOnce a pull request is submitted, the code is not merged until the pull request is accepted. The sooner a pull request is accepted, the sooner the code is merged into the project. In the previous question, we saw that projects using CI accept fewer (i.e., reject or ignore more) pull requests than projects not using CI. In this question, we consider only accepted pull requests, and ask whether there is a difference in the time it takes for projects to accept pull requests with and without CI. One reason developers gave for using CI is that it makes integration easier. One respondent (P183) added \u201cTo be more confident when merging PRs\u201d. If integration is easier, does it then translate into pull requests being integrated faster?\n\nFigure 6 shows the distributions of the time to accept pull requests, with and without CI. To compute these results, we select, from our depth corpus, all the pull requests that were accepted, both with and without build information from the CI server. The mean time with CI is 81 hours, but the median is only 5.2 hours. Similarly, the mean time without CI is 140 hours, but the median is 6.8 hours. Comparing the median time to accept the pull requests, we find that the median pull request is merged 1.6 hours faster than pull requests without CI information. This difference is statistically significant (Wilcoxon, \\( p < 0.0000001 \\)).\n\n**Observation**\n\nCI build status can make integrating pull requests faster. When using CI, the median pull request is accepted 1.6 hours sooner.\n\nTable 9: Percentage of builds that succeed by pull request target\n\n| Pull Request Target | Percent Passed Builds |\n|---------------------|-----------------------|\n| Master | 72.03 |\n| Other | 65.36 |\n\n**RQ14:** Do CI builds fail less on master than on other non-master branches?\n\nThe most popular reason that participants gave for using CI was that it helps avoid breaking the build. Thus, we analyze this claim in the depth corpus. Does the data show a difference in the way developers use CI with the master branch vs. with the other branches? Is there any difference between how many builds fail on master vs. on the\nother branches? Perhaps developers take more care when writing a pull request for master than for another branch.\n\nTable 9 shows the percentage of builds that pass in pull requests to the master branch, compared to all other branches. We found that pull requests are indeed more likely to pass when they are on master.\n\n**Observation**\n\nCI builds on the master branch pass more often than on the other branches.\n\n5. IMPLICATIONS\n\nWe offer practical implications of our findings for researchers, developers, and tool builders.\n\n**Researchers**\n\nRQ1, RQ3, RQ4, RQ5: CI is not a \u201cfad\u201d but is here to stay. Because CI is widely used and more projects are adopting it, and has not yet received much attention from the research community, it is time for researchers to study its use and improve it, e.g., automate more tasks (such as setting up CI). We believe that researchers can contribute many improvements to the CI process once they understand the current state-of-the-practice in CI.\n\nRQ2: Similarly with how GitHub has become the main gateway for researchers who study software, we believe Travis CI can become the main gateway for researchers who study CI. Travis offers a wealth of CI data, accessible via public API. Therefore, researchers can maximize their impact by studying a single system.\n\nRQ7, RQ8: We found evidence of frequent evolution of CI configuration files (similar evolution was found for Makefiles [21]), so researchers can focus on providing support for safe automation of changes in configuration files, e.g., via safe refactoring tools.\n\nRQ6, Table 4: The most common reason why developers do not use CI is unfamiliarity with CI, so there is tremendous opportunity for providing educational resources. We call upon university educators to enrich their software engineering curriculum to cover the basic concepts and tooling for CI.\n\n**Developers**\n\nRQ3, Table 3: The data shows that CI is more widely embraced by the projects that use dynamically typed languages (e.g., 64% of 2721 Ruby projects use CI, compared with only 20% of 2745 Objective-C projects that use CI). To mitigate the lack of a static type system, developers that use dynamically typed languages should use CI to run tests and help catch errors early on.\n\nRQ13: Our analysis of the depth corpus shows that the presence of CI makes it easier to accept contributions in open-source projects, and this was also indicated by several survey respondents, e.g., \u201cCI gives external contributors confidence that they are not breaking the project\u201d (P310). Considering other research [43] that reports a lack of diversity in open-source projects, attracting new contributors is desirable. Thus, projects that aim to diversify their pool of contributors should consider using CI.\n\nRQ7, RQ9: Because the average times for a single CI build is fairly short, and CI configurations are maintainable, it appears that the benefits of CI outweigh the costs. Thus, developers should use CI for their projects.\n\nRQ3, RQ11, RQ12, RQ14: The use of CI correlates with positive outcomes, and CI has been adopted by the most successful projects on GitHub, so developers should consider CI as a best practice and should use it as widely as possible.\n\n**Tool Builders**\n\nRQ6: CI helps catching bugs, but not locating them. The CI build logs often bury an important error message among hundreds of lines of raw output. Thus, tool builders that want to improve CI can focus on new ways to integrate fault-localization techniques into CI.\n\nRQ1, RQ7, RQ8: Despite wide adoption, there are many projects that have yet to use CI. Tool builders could parse build files [56], and then generate configuration files necessary for CI. By automating this process, tool builders can lower the entry barrier for developers who are unfamiliar with CI.\n\n6. THREATS TO VALIDITY\n\n6.1 Construct\n\nAre we asking the right questions? We are interested in assessing the usage of CI in open-source projects. To do this we have focused on what, how, and why questions. We think that these questions have high potential to provide unique insight and value for different stakeholders: developers, tool builders, and researchers.\n\n6.2 Internal\n\nIs there something inherent to how we collect and analyze CI usage data that could skew the accuracy of our results?\n\nOnce a CI server is configured, it will continue to run until it is turned off. This could result in projects configuring a CI server, and then not taking into account the results as they continue to do development. However, we think this is unlikely because Travis CI and GitHub have such close integration. It would be difficult to ignore the presence of CI when there are visual cues all throughout GitHub when a project is using CI.\n\nSome CI services are run in a way such that they cannot be detected from the information that is publicly available in the GitHub repository. This means that we could have missed some projects. However, this would mean that we are underestimating the extent to which CI has been adopted.\n\nDespite a 9.8% response rate to our survey, still over 90% of our targeted population did not respond. We had no control over who responded to our survey, so it may suffer from self-selection bias. We think this is likely because 92% of our survey participants reported using CI, much higher than the percentage of projects we observed using CI in the data. In order to mitigate this, we made the survey short.\nand provided a raffle as incentive to participate, to get the most responses as possible.\n\n6.3 External\n\nAre our results generalizable for general CI usage? While we analyzed a large number of open-source repositories, we cannot guarantee that these results will be the same for proprietary (closed-source) software. In fact, we consider it very likely that closed-source projects would be unwilling to send their source over the internet to a CI service, so our intuition is that they would be much more likely to use a local CI solution. Further work should be done to investigate the usage of CI in closed-source projects.\n\nBecause we focused on Travis CI, it could be that other CI services are used differently. As we showed in RQ2, Travis CI was the overwhelming favorite CI service to use, so by focusing on that we think our results are representative.\n\nAdditionally, we only selected projects from GitHub. Perhaps open-source projects that have custom hosting also would be more likely to have custom CI solutions. More work is needed to determine if these results generalize.\n\n7. RELATED WORK\n\nWe group our related work into three different areas: (i) CI usage, (ii) CI technology, and (iii) related technology.\n\nCI Usage\n\nThe closest work to ours is by Vasilescu et al. [53] who present two main findings. They find that projects that use CI are more effective at merging requests from core members, and the projects that use CI find significantly more bugs. However, the paper explicitly states that it is a preliminary study on only 246 GitHub projects, and treats CI usage as simply a boolean value. In contrast, this paper examines 34,544 projects, 1,529,291 builds, and 442 survey responses to provide detailed answers to 14 research questions about CI usage, costs, and benefits.\n\nA tech report from Beller et al. [25] performs an analysis of CI builds on GitHub, specifically focusing on Java and Ruby languages. They answer several research questions about tests, including \u201cHow many tests are executed per build?\u201d, \u201cHow often do tests fail?\u201d, and \u201cDoes integration in different environments lead to different test results?\u201d. These questions however, do not serve to comprehensively support or refute the productivity claims of CI.\n\nTwo other papers [44,46] have analyzed a couple of case studies of CI usage. These are just two case studies total, unlike this paper that analyzes a broad and diverse corpus.\n\nLepp\u00e4nen et al. [45] interviewed developers from 15 software companies about what they perceived as the benefits of CI. They found one of the perceived benefits to be more frequent releases. One of their participants said CI reduced release time from six months to two weeks. Our results confirm that projects that use CI release twice as fast as projects that do not use CI.\n\nBeller et al. [24] find that developers report testing three times more often than they actually do test. This over-reporting shows that CI is needed to ensure tests are actually run. This confirms what one of our respondents (P287) said: \u201cIt forces contributors to run the tests (which they might not otherwise do)\u201d Kochhar et al. [42] found that larger Java open-source projects had lower test coverage rates, also suggesting that CI can be beneficial.\n\nCI technology\n\nSome researchers have proposed approaches to improve CI servers by having servers communicate dependency information [31], generating tests during CI [30], or selecting tests based on code churn [41]. Also researchers [27] have found that integrating build information from various sources can help developers. In our survey, we found that developers do not think that CI helps them locate bugs; this problem has been also pointed out by others [28].\n\nOne of the features of CI systems is that they report the build status so that it is clear to everyone. Downs et al. [32] developed a hardware based system with devices shaped like rabbits which light up with different colors depending on the build status. These devices keep developers informed about the status of the build.\n\nRelated Technology\n\nA foundational technology for CI is build systems. Some ways researchers have tried to improve their performance has been by incremental building [35] as well as optimizing dependency retrieval [29].\n\nPerforming actions continuously can also bring extra value, so researchers have proposed several activities such as continuous test generation [54], continuous testing (continuously running regression tests in the background) [50], continuous compliance [36], and continuous data testing [47].\n\n8. CONCLUSIONS\n\nCI has been rising as a big success story in automated software engineering. In this paper we study the usage, the growth, and the future prospects of CI using data from three complementary sources: (i) 34,544 open-source projects from GitHub, (ii) 1,529,291 builds from the most commonly used CI system, and (iii) 442 survey respondents. Using this rich data, we investigated 14 research questions.\n\nOur results show there are good reasons for the rise of CI. Compared to projects that do not use CI, projects that use CI: (i) release twice as often, (ii) accept pull requests faster, and (iii) have developers who are less worried about breaking the build. Therefore, it should come as no surprise that 70% of the most popular projects on GitHub heavily use CI.\n\nThe trends that we discover point to an expected growth of CI. In the future, CI will have an even greater influence than it has today. We hope that this paper provides a call to action for the research community to engage with this important field of automated software engineering.\n\n9. ACKNOWLEDGMENTS\n\nWe thank CloudBees for sharing with us the list of open-source projects using CloudBees, Travis for fixing a bug in their API to enable us to collect all relevant build history, and Amin Alipour, Denis Bogdanas, Mihai Codoban, Alex Gyori, Kory Kraft, Nicholas Lu, Shane McKee, Nicholas Nelson, Semih Okur, August Shi, Sruti Srinivasa Ragavan, and the anonymous reviewers for their valuable comments and suggestions on an earlier version of this paper.\n\nThis work was partially funded through the NSF CCF-1421503, CCF-1439957, and CCF-1553741 grants.\n10. REFERENCES\n\n[1] 7 reasons why you should be using continuous integration. https://about.gitlab.com/2015/02/03/7-reasons-why-you-should-be-using-ci/. Accessed: 2016-04-24.\n\n[2] AppVeyor. https://www.appveyor.com/. Accessed: 2016-04-26.\n\n[3] The benefits of continuous integration. https://blog.codeship.com/benefits-of-continuous-integration/. Accessed: 2016-04-24.\n\n[4] Build in the cloud. http://google-engtools.blogspot.com/2011/08/build-in-cloud-how-build-system-works.html. Accessed: 2016-04-24.\n\n[5] CircleCI. https://circleci.com/. Accessed: 2016-04-26.\n\n[6] CloudBees. http://cloudbees.com/. Accessed: 2016-04-26.\n\n[7] Continuous integration. https://www.thoughtworks.com/continuous-integration. Accessed: 2016-04-24.\n\n[8] Continuous integration is dead. http://www.yegor256.com/2014/10/08/continuous-integration-is-dead.html. Accessed: 2016-04-24.\n\n[9] CruiseControl. http://cruisecontrol.sourceforge.net/. Accessed: 2016-04-21.\n\n[10] CrunchBase. https://www.crunchbase.com/organization/travis-ci#/entity. Accessed: 2016-04-24.\n\n[11] Google Search Trends. https://www.google.com/trends/. Accessed: 2016-04-24.\n\n[12] Jenkins. https://jenkins.io/. Accessed: 2016-04-21.\n\n[13] Restkit. https://github.com/RestKit/RestKit. Accessed: 2016-04-29.\n\n[14] Stackoverflow. http://stackoverflow.com/questions/214695/what-are-some-arguments-against-using-continuous-integration. Accessed: 2016-04-24.\n\n[15] Team Foundation Server. https://www.visualstudio.com/en-us/products/tfs-overview-vs.aspx. Accessed: 2016-04-21.\n\n[16] Tools for software engineers. http://research.microsoft.com/en-us/projects/tse/. Accessed: 2016-04-24.\n\n[17] Travis CI. https://travis-ci.org/. Accessed: 2016-04-21.\n\n[18] Werker. http://wercker.com/. Accessed: 2016-04-26.\n\n[19] Why don\u2019t we use continuous integration? https://blog.inf.ed.ac.uk/sapm/2014/02/14/why-dont-we-use-continuous-integration/. Accessed: 2016-04-24.\n\n[20] Yaml: Yaml ain\u2019t markup language. http://yaml.org/. Accessed: 2016-04-24.\n\n[21] J. M. Al-Kofahi, H. V. Nguyen, A. T. Nguyen, T. T. Nguyen, and T. N. Nguyen. Detecting semantic changes in Makefile build code. In ICSM, 2012.\n\n[22] J. Allspaw and P. Hammond. 10+ deploys per day: Dev and ops cooperation at Flickr. https://www.youtube.com/watch?v=LdOe18KhtT4. Accessed: 2016-04-21.\n\n[23] K. Beck. Embracing change with Extreme Programming. Computer, 32(10):70\u201377, 1999.\n\n[24] M. Beller, G. Gousios, and A. Zaidman. How (much) do developers test? In ICSE, 2015.\n\n[25] M. Beller, G. Gousios, and A. Zaidman. Oops, my tests broke the build: An analysis of travis ci builds with github. Technical report, PeerJ Preprints, 2016.\n\n[26] G. Booch. Object Oriented Design with Applications. Benjamin-Cummings Publishing Co., Inc., 1991.\n\n[27] M. Brandtner, E. Giger, and H. C. Gall. Supporting continuous integration by mashing-up software quality information. In CSMR-WCRE, 2014.\n\n[28] M. Brandtner, S. C. M\u00fcller, P. Leitner, and H. C. Gall. SQA-Profiles: Rule-based activity profiles for continuous integration environments. In SANER, 2015.\n\n[29] A. Celik, A. Knaust, A. Milicevic, and M. Gligoric. Build system with lazy retrieval for Java projects. In FSE, 2016.\n\n[30] J. C. M. de Campos, A. Arcuri, G. Fraser, and R. F. L. M. de Abreu. Continuous test generation: Enhancing continuous integration with automated test generation. In ASE, 2014.\n\n[31] S. D\u00f6ssinger, R. Mordinyi, and S. Biffl. Communicating continuous integration servers for increasing effectiveness of automated testing. In ASE, 2012.\n\n[32] J. Downs, B. Plimmer, and J. G. Hosking. Ambient awareness of build status in collocated software teams. In ICSE, 2012.\n\n[33] S. Elbaum, G. Rothermel, and J. Penix. Techniques for improving regression testing in continuous integration development environments. In FSE, 2014.\n\n[34] J. Engblom. Virtual to the (near) end: Using virtual platforms for continuous integration. In DAC, 2015.\n\n[35] S. Erdweg, M. Lichter, and M. Weiel. A sound and optimal incremental build system with dynamic dependencies. In OOPSLA, 2015.\n\n[36] B. Fitzgerald, K. J. Stol, R. O\u2019Sullivan, and D. O\u2019Brien. Scaling agile methods to regulated environments: An industry case study. In ICSE, 2013.\n\n[37] M. Fowler. Continuous Integration. http://martinfowler.com/articles/originalContinuousIntegration.html. Accessed: 2016-04-21.\n\n[38] M. Gligoric, L. Eloussi, and D. Marinov. Practical regression test selection with dynamic file dependencies. In ISSTA, 2016.\n\n[39] G. Gousios. The GHTorrent dataset and tool suite. In MSR, 2013.\n\n[40] J. Humble. Evidence and case studies. http://continuousdelivery.com/evidence-case-studies/. Accessed: 2016-04-29.\n\n[41] E. Knauss, M. Staron, W. Meding, O. S\u00f6der, A. Nilsson, and M. Castell. Supporting continuous integration by code-churn based test selection. In RCoSE, 2015.\n\n[42] P. S. Kochhar, F. Thung, D. Lo, and J. L. Lawall. An empirical study on the adequacy of testing in open source projects. In APSEC, 2014.\n\n[43] V. Kuechler, C. Gilbertson, and C. Jensen. Gender differences in early free and open source software joining process. In IFIP, 2012.\n\n[44] E. Laukkonen, M. Paasivaara, and T. Arvonen. Stakeholder perceptions of the adoption of continuous integration: A case study. In AGILE, 2015.\n[45] M. Lepp\u00e4nen, S. M\u00e4kinen, M. Pagels, V. P. Eloranta, J. Itkonen, M. V. M\u00e4ntyl\u00e4, and T. M\u00e4nnist\u00e4. The highways and country roads to continuous deployment. *IEEE Software*, 2015.\n\n[46] A. Miller. A hundred days of continuous integration. In *AGILE*, 2008.\n\n[47] K. Mu\u015flu, Y. Brun, and A. Meliou. Data debugging with continuous testing. In *FSE*, 2013.\n\n[48] V. One. 10th annual state of Agile development survey. https://versionone.com/pdf/VersionOne-10th-Annual-State-of-Agile-Report.pdf, 2016.\n\n[49] Puppet and DevOps Research and Assessments (DORA). 2016 state of DevOps Report. https://puppet.com/system/files/2016-06/2016%20State%20of%20DevOps%20Report.pdf, 2016.\n\n[50] D. Saff and M. D. Ernst. Continuous testing in Eclipse. In *ICSE*, 2005.\n\n[51] Testing at the speed and scale of Google, Jun 2011. http://google-engtools.blogspot.com/2011/06/testing-at-speed-and-scale-of-google.html.\n\n[52] Tools for continuous integration at Google scale, October 2011. http://www.youtube.com/watch?v=b52aXZ2yi08.\n\n[53] B. Vasilescu, Y. Yu, H. Wang, P. Devanbu, and V. Filkov. Quality and productivity outcomes relating to continuous integration in GitHub. In *FSE*, 2015.\n\n[54] Z. Xu, M. B. Cohen, W. Motycka, and G. Rothermel. Continuous test suite augmentation in software product lines. In *SPLC*, 2013.\n\n[55] S. Yoo and M. Harman. Regression testing minimization, selection and prioritization: A survey. *STVR*, 22(2):67\u2013120, 2012.\n\n[56] S. Zhou, J. M. Al-Kofahi, T. N. Nguyen, C. K\u00e4stner, and S. Nadi. Extracting configuration knowledge from build files with symbolic analysis. In *RELENG*, 2015.", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/020_hilton.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 12, "total-input-tokens": 39172, "total-output-tokens": 15406, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 4692, 1], [4692, 10601, 2], [10601, 17130, 3], [17130, 21978, 4], [21978, 26733, 5], [26733, 32004, 6], [32004, 37165, 7], [37165, 42008, 8], [42008, 47446, 9], [47446, 53655, 10], [53655, 58875, 11], [58875, 60472, 12]]}}
|
|
{"id": "1e8f6abb8c9d5859f5882f80fc5f80b161a637f1", "text": "Automating Dependency Updates in Practice: An Exploratory Study on GitHub Dependabot\n\nRunzhi He, Hao He, Yuxia Zhang, Minghui Zhou\n\nAbstract\u2014Dependency management bots automatically open pull requests to update software dependencies on behalf of developers. Early research shows that developers are suspicious of updates performed by dependency management bots and feel tired of overwhelming notifications from these bots. Despite this, dependency management bots are becoming increasingly popular. Such contrast motivates us to investigate Dependabot, currently the most visible bot on GitHub, to reveal the effectiveness and limitations of state-of-art dependency management bots. We use exploratory data analysis and a developer survey to evaluate the effectiveness of Dependabot in keeping dependencies up-to-date, interacting with developers, reducing update suspicion, and reducing notification fatigue. We obtain mixed findings. On the positive side, projects do reduce technical lag after Dependabot adoption and developers are highly receptive to its pull requests. On the negative side, its compatibility scores are too scarce to be effective in reducing update suspicion; developers tend to configure Dependabot toward reducing the number of notifications; and 11.3% of projects have deprecated Dependabot in favor of other alternatives. The survey confirms our findings and provides insights into the key missing features of Dependabot. Based on our findings, we derive and summarize the key characteristics of an ideal dependency management bot which can be grouped into four dimensions: configurability, autonomy, transparency, and self-adaptability.\n\nIndex Terms\u2014Dependency Management, Software Engineering Bot, Dependabot, Mining Software Repositories\n\n1 INTRODUCTION\n\nTo update or not to update, that is the question haunting software engineers for decades. The software engineering \u201cgurus\u201d would argue that keeping software dependencies up-to-date minimizes technical debt, increases supply chain security, and ensures software project sustainability in the long term [1]. Nonetheless, it requires not only substantial effort but also extra responsibility from developers. Consequently, many developers adhere to the practice of \u201cif it ain\u2019t broke, don\u2019t fix it\u201d and the majority of existing software systems use outdated dependencies [2].\n\nOne promising solution for this dilemma is to use bots to automate all dependency updates. Therefore, dependency management bots are invented to automatically open pull requests (PRs) to update dependencies in a collaborative coding platform (e.g., GitHub) in the hope of saving developer effort. Recently, dependency management bots are increasingly visible and gaining high momentum among practitioners. The exemplars of these bots, including Dependabot [3], Renovate Bot [4], PyUp [5], and Synk Bot [6], have opened millions of PRs on GitHub [7] and are adopted by a variety of industry teams (according to their websites).\n\nHowever, the simple idea of using a bot does not save the world. The early work of Mirhosseini and Parnin [8] on Greenkeeper [9] reveals that: only 32% of Greenkeeper PRs are merged because developers are suspicious of whether a bot PR will break their code (i.e., update suspicion) and feel annoyed about a large number of bot PRs (i.e., notification fatigue). Since then, similar bots have emerged, evolved, and gained high popularity, among them the most visible one on GitHub is Dependabot [7] with many improvements (Section 2.3). However, it remains unknown to what extent can these bots overcome the two limitations of Greenkeeper identified by Mirhosseini and Parnin [8] in 2017.\n\nTo shed light on improving dependency management bots and software engineering bots in general, we present an exploratory study on Dependabot. Our study answers the following four research questions (RQs) to empirically evaluate the effectiveness of Dependabot version update in different dimensions (detailed motivations in Section 3):\n\n- **RQ1:** To what extent does Dependabot reduce the technical lag of a project after its adoption?\n- **RQ2:** How actively do developers respond to and merge pull requests opened by Dependabot?\n- **RQ3:** How effective is Dependabot\u2019s compatibility score in allaying developers\u2019 update suspicion?\n- **RQ4:** How do projects configure Dependabot for automating dependency updates?\n\nAs we find that many projects have deprecated Dependabot in favor of other alternatives, we ask an additional RQ:\n\n- **RQ5:** How do projects deprecate Dependabot and what are the developers\u2019 desired features for Dependabot?\nTo answer the RQs, we sample 1,823 popular and actively maintained GitHub projects as the study subjects. We conduct exploratory data analysis on 502,752 Dependabot PRs from these projects and use a survey of 131 developers to triangulate our findings. Our findings provide empirical characterizations of Dependabot\u2019s effectiveness in various dimensions. More importantly, we discover important limitations of Dependabot (a state-of-the-art bot) in overcoming update suspicion and notification fatigue, along with the missing features for overcoming the limitations. Based on the findings, we summarize four key properties of an ideal dependency management bot (i.e., configurability, autonomy, transparency, and self-adaptability) as a roadmap for software engineering researchers and bot designers.\n\n2 Background and Related Work\n\n2.1 Dependency Update\n\nIn modern software development, updating dependencies is not only important but also non-trivial. A typical software project may have tens to thousands of dependencies and each of the outdated ones induces risks [10]. However, each update may contain breaking changes which can be hard to discover and fix [11]. This situation inspires research into understanding update practices, designing metrics, and inventing approaches to support dependency updates.\n\nBavota et al. [12] find that updates in the Apache ecosystems are triggered by major changes or a large number of bug fixes, but may be prevented by API removals. Kula et al. [2] discover that 81.5% of the 4600 studied Java/Maven projects on GitHub still keep outdated dependencies due to lack of awareness and extra workload. Pashchenko et al. [13] find through semi-structured interviews that developers face trade-offs when updating dependencies (e.g., vulnerabilities, breaking changes, policies).\n\nResearchers have proposed measurements to quantify the \u201cfreshness\u201d or \u201coutdatedness\u201d of software dependencies and applied them to various software ecosystems. Cox et al. [14] propose several metrics to quantify \u201cdependency freshness\u201d and evaluate them on a dataset of industrial Java systems. A series of studies [15], [16], [17], [18], [19], [20] introduce the notion of technical lag, a metric for measuring the extent of project dependencies lagging behind their latest releases, and investigate the evolution of technical lag in Debian [15], npm [16], [17], [18], the Libraries.io dataset [19], and Docker images [20]. They find that technical lag tends to increase over time, induces security risks, and can be mitigated using semantic versioning.\n\nThere has been a long line of research in software engineering for supporting the automated update of software. Since API breaking changes form the majority of update cost, most studies propose automated approaches to match and adapt evolving APIs (e.g., [21], [22], [23], [24], [25]). However, Cossette and Walker [26] reveal through manual analysis that real API adaptation tasks are complex and beyond the capability of previous automated approaches. Recently, research interest in automated API adaptation is surging again with works on Java [27], JavaScript [28], Python [29], Android [30], etc.\n\nOn the other hand, practitioners often take conservative update approaches: upstream developers typically use semantic versioning [31] for signaling version compatibility; downstream developers perform most updates manually and detect incompatibilities through release notes, compilation failures, and regression testing. Unfortunately, studies [32], [33], [34], [35] reveal that none of them work well in guaranteeing update compatibility. Generally, providing such guarantees is still a challenging open problem [36].\n\n2.2 Dependency Management Bots\n\nPerhaps the most noticeable automation effort among practitioners is dependency management bots. These bots automatically create pull requests (PRs) to update dependencies either immediately after a new release is available or when a security vulnerability is discovered in the currently used version. In other words, dependency management bots solve the lack of awareness problem [2] by automatically pushing update notifications to developers.\n\nMirhosseini and Parnin [8] conduct a pioneering study on Greenkeeper and find that developers update dependencies 1.6x more frequently with Greenkeeper, but only 32% of Greenkeeper PRs are merged due to two major limitations:\n\n- **Update Suspicion:** If an automated update PR breaks their code, developers immediately become suspicious of subsequent PRs and are reluctant to merge them.\n- **Notification Fatigue:** If too many automated update PRs are generated, developers may feel annoyed about the notifications and simply ignore all the update PRs.\n\nRombaut et al. [37] find that Greenkeeper issues for in-range breaking updates induce a large maintenance overhead, and many of them are false alarms caused by project CI issues.\n\nThe limitations of Greenkeeper align well with the challenges revealed in the software engineering (SE) bot literature. Wessel et al. [38] find that SE bots on GitHub have interaction problems and provide poor decision-making support. Erlenhov et al. [39] identify two major challenges in \u201cAlex\u201d bot (i.e., SE bots that autonomously perform simple tasks) design: establishing trust and reducing interruption/noise. Wyrich et al. [7] find that bot PRs have a lower merge rate and need more time to be interacted with and merged. Two subsequent studies by Wessel et al. [40], [41] qualitatively show that noise is the central challenge in SE bot design but it can be mitigated by certain design strategies and the use of a \u201cmeta-bot.\u201d Shihab et al. [42] draw a picture of SE bot technical and socio-economic challenges. Santhanam et al. [43] provide a systematic mapping of the SE bot literature.\n\nSince Mirhosseini and Parnin [8], many other bots have emerged for automating dependency updates, such as Dependabot [3] (preview release in May 2017) and Renovate Bot [4] (first release in January 2017). Greenkeeper itself reaches end-of-life in June 2020 and its team merged with Synk Bot [6]. All these bots are widely used: according to Wyrich et al. [7], they opened the vast majority of bot PRs on GitHub (six out of the top seven). The top two are occupied by Dependabot [3] and Dependabot Preview [44] with \u223c3 million PRs and \u223c1.2 million PRs, respectively. Erlenhov et al. [45] find that under a strict SE bot definition, almost all bots in an existing bot commit dataset [46] are dependency management bots and they are frequently adopted, discarded, switched, and even simultaneously used by GitHub projects, indicating a fierce competition among them.\n2.3 Dependabot\n\nAmong different dependency management bots, Dependabot [3] is the most visible one in GitHub projects [7]. Dependabot Preview was launched in 2017 [47] and acquired by GitHub in 2019 [48]. In August 2021, it was shut down in favor of the new, GitHub native Dependabot [49] operating since June 2020, which offers two main services:\n\n- **Dependabot version update** [50]: If a configuration file named `dependabot.yml` is added to a GitHub repository, Dependabot will begin to open PRs that update project dependencies to the latest version. Developers can specify the exact Dependabot behavior in `dependabot.yml` (e.g., update interval and the max number of PRs).\n\n- **Dependabot security update** [51]: Dependabot scans the entire GitHub to find repositories with vulnerable dependencies. Even if no `dependabot.yml` is supplied, Dependabot still alerts repository owners and repository owners can tell Dependabot to open PRs that update vulnerable dependencies to their patched versions.\n\nFigure 1 shows an example Dependabot PR [52]. Apart from all other details, one especially interesting Dependabot feature is the **compatibility score** badge. According to GitHub documentation [53]: *An update\u2019s compatibility score is the percentage of CI runs that passed when updating between specific versions of the dependency.* In other words, the score uses the large-scale regression testing data available in GitHub CI test results to estimate the risk of breaking changes in a dependency update. This looks like a promising direction for solving the **update suspicion** problem, as previous studies have shown that project test suites are often unreliable in detecting update incompatibilities [34] and the false alarms introduce significant maintenance overhead [37]. However, the score\u2019s effectiveness in practice remains unknown.\n\nFor the **notification fatigue** problem, Wessel et al. [40] suggest SE bots offer flexible configurations and send only relevant notifications. Both solutions have been (principally) implemented by Dependabot, but it is still unclear whether the specific configuration options and notification strategies taken by Dependabot are really effective in practice. Alfadel et al. [54] find that developers receive Dependabot security PRs well: 65.42% of PRs are merged and most are merged within a day. However, security PRs only constitute a small portion of Dependabot PRs (6.9% in our dataset), and developers perceive security updates as highly relevant [13]. The effectiveness of Dependabot version update in general seems to be more problematic. Soto-Valero et al. [55] find that Dependabot opens many PRs on bloated dependencies. Cogo and Hassan [56] provides evidence on how the configuration of Dependabot causes issues for developers. As stated by two developers in a GitHub issue [57]: 1) *I think we\u2019d rather manage dependency upgrades ourselves, on our own time. We\u2019ve been frequently bitten by dependency upgrades causing breakages. We tend to only upgrade dependencies when we\u2019re close to being ready to cut a release.* 2) *Also Dependabot tends to be pretty spammy, which is rather annoying.*\n\nTo the best of our knowledge, a comprehensive empirical investigation into the adoption of the Dependabot version update service is still lacking. Such knowledge from Dependabot can help the formulation of general design guidelines for dependency management bots and unveil important open challenges for fulfilling these guidelines.\n\n3 RESEARCH QUESTIONS\n\nOur study goal is to evaluate the practical effectiveness of the **Dependabot version update** service. In this Section, we elaborate on the motivation of each RQ toward this goal.\n\nThe Dependabot version update service is designed to make developers aware of new versions and help them keep project dependencies up-to-date. To quantitatively evaluate the extent to which Dependabot fulfills its main design purpose (i.e., *keeping dependencies up-to-date*), we reuse metrics from the technical lag literature [16], [18] and ask:\n\n**RQ1:** To what extent does Dependabot reduce the technical lag of a project after its adoption?\n\nTo help developers keep dependencies up-to-date, Dependabot intervenes by automatically creating update PRs when new versions become available, after which developers can interact with (e.g., comment, merge) these PRs. We evaluate the effectiveness of this interaction process by measuring the extent to which developers interact smoothly with Dependabot PRs, forming the next RQ:\n\n**RQ2:** How actively do developers respond to and merge pull requests opened by Dependabot?\n\nOne major limitation of Greenkeeper is that developers tend to be suspicious of whether a dependency update will introduce break changes [8] (i.e., *update suspicion*). On the other hand, Dependabot helps developers establish confidence on update PRs using the **compatibility score** feature (Section 2.3). To quantitatively evaluate the effectiveness of this feature against update suspicion, we ask:\n\n**RQ3:** How effective is Dependabot\u2019s compatibility score in allaying developers\u2019 update suspicion?\n\nThe other major limitation of Greenkeeper is that developers tend to be overwhelmed by a large number of update PRs [8] (i.e., *notification fatigue*). On the other hand, Dependabot provides flexible configuration options for controlling the amount notifications (Section 2.3). To explore how developers configure (and re-configure) the number of notifications generated by Dependabot, we study real-world Dependabot configurations and ask:\n\n**RQ4:** How do projects configure Dependabot for automating dependency updates?\n\nDuring our analysis, we discover that a non-negligible portion of projects in our studied corpus have deprecated Dependabot and migrated to other alternatives. As an in-depth retrospective analysis of the reasons behind these deprecations can help reveal important Dependabot limitations and future improvement directions, we ask:\n\n**RQ5:** How do projects deprecate Dependabot and what are the developers\u2019 desired features for Dependabot?\n4 STUDY DESIGN\n\nAn overview of our study is shown in Figure 2. The study follows a mix-method study design where we obtain results from repository data analysis and triangulate them with a developer survey. In this Section, we introduce the data collection and survey methods. The specific analysis methods will be presented along with their results in Section 5.\n\n4.1 Data Collection\n\n**Project Selection.** As the first step, we need to collect a sample of engineered and maintained GitHub projects using or once used Dependabot version update in their workflow. We focus on the GitHub native Dependabot (released on June 1, 2020) and do not include Dependabot Preview in our study because the former provides much richer features and allows us to obtain the latest, state-of-the-art results.\n\nWe begin with the latest dump of GHTorrent [58] (released on March 6, 2021), a large-scale dataset of GitHub projects widely used in software engineering research (e.g., [7], [34]). We find a noticeable gap in the GHTorrent dataset from July 2019 to early January 2020 (also observed by Wyrich et al. [7]). Focusing solely on GitHub native Dependabot allows us to circumvent threats caused by this gap because all its PRs are created after January 2020.\n\nWe select projects with at least 10 merged Dependabot PRs to keep only projects that have used Dependabot to some degree. To filter out irrelevant, low-quality, or unpopular projects, we retain only non-fork projects with at least 10 stars, as inspired by previous works [55], [59], [60]. Since projects without sustained activities may not perform dependency updates on a regular basis and induce noise in technical lag analysis (RQ1), we query GitHub APIs [61] and retain projects with a median weekly commit of at least one in the past year. To exclude projects that have never utilized Dependabot version update, we clone and retain only projects with some git change history on dependabot.yml. After all the filtering steps, we end up with 1,823 projects.\n\n**PR Collection.** We use GitHub REST API [61] and a web scraper to find all Dependabot PRs (before February 14, 2022) in the projects and collect PR statistics, CI test results, and timeline events. By leveraging a distributed pool of Cloudflare workers [62], this web scraper empowers us to bypass the limitation of GitHub APIs (which is unhandy for collecting CI test results for PRs) and retrieve PR events and CI test results at scale. The PR body can tell which dependency this PR is updating, its current version, and its updated version. By the end of this stage, we obtain 540,665 Dependabot PRs (71.1% with a CI test result), updating 15,590 dependencies between 167,841 version pairs.\n\nOur next task is to identify security updates from all of the PRs created by Dependabot. However, Dependabot is no longer labeling security updates due to security reasons. Instead, Dependabot is showing a banner on the PR web page which is only visible to repository administrators by default [51]. Therefore, we choose to construct a mirror of the GitHub security advisory database [63] and identify security PRs ourselves by checking whether the PR updates a version with a vulnerability entry at the time of PR creation. More specifically, we identify a PR to be a security update PR if: 1) the dependency and its current version matches a vulnerability in the GitHub security advisory database; 2) the updated version is newer than the version that fixes this vulnerability (i.e., no vulnerability after update); 3) the PR is created after the vulnerability disclosure in CVE. Eventually, we identify 37,313 security update PRs (6.9%) from the 540,665 Dependabot PRs in total.\n\n**Dataset Overview.** As illustrated in Table 1, projects in our dataset are mostly engineered, popular GitHub projects with a large code base, active maintenance, rich development history, and frequent Dependabot usage. We notice a long-tail distribution in the metrics concerning the size of the project, i.e., number of contributors, lines of code, and commit frequency, which is expected and common in most mining software repository (MSR) datasets [35], [64], [65].\n\nMost (44.1%) projects in our dataset utilize the npm package ecosystem, followed by Maven (12.3%), PyPI (11.7%), and Go modules (7.8%). Among the Dependabot PRs, those that update npm packages constitute even a higher portion (64.9%), followed by PyPI (8.9%), Go modules (4.3%), Bundler (3.9%), and Maven (3.9%), as packages in the npm ecosystem generally evolve faster [66].\n\nDependabot has opened hundreds of PRs for most of the projects (mean = 304, median = 204), even up to thousands for some of them. This likely indicates a high workload for project maintainers. In terms of the most updated dependencies, it is not surprising that all\n\n| Statistics | Mean | Median | Distribution |\n|-----------------------------|--------|--------|--------------|\n| # of Stars | 1423.92| 66.00 | |\n| # of Commits | 2837.11| 1040.50| |\n| # of Contributors | 26.50 | 12.00 | |\n| Lines of Code (thousands) | 98.18 | 19.89 | |\n| # of Commits per Week | 10.07 | 4.00 | |\n| Age at Adoption (days) | 1018.18| 714.00 | |\n| # of Dependabot PRs | 304.56 | 204.00 | |\n| # of Dependabot Interactions| 644.54 | 410.00 | |\n| # of Commits | 477.00 | 331.50 | |\n| # of Followers | 168.00 | 53.50 | |\n| Years of Experience (GitHub)| 10.37 | 10.68 | |\nTABLE 2: Survey Questions and Their Results (131 Responses in Total)\n\n| 5-Point Likert-Scale Questions | Distribution | Avg. |\n|--------------------------------|--------------|------|\n| (RQ1) Dependabot helps my project keep all dependencies up-to-date. | 50% | 4.44 |\n| (RQ2) Dependabot PRs do not require much work to review and merge. | 25% | 3.94 |\n| (RQ2) I respond to a Dependabot PR fast if it can be safely merged. | 25% | 4.42 |\n| (RQ2) I ignore the Dependabot PR or respond slower if it cannot be safely merged. | 25% | 3.78 |\n| (RQ2) I handle a Dependabot PR with higher priority if it updates a vulnerable dependency. | 25% | 4.19 |\n| (RQ2) It requires more work to review and merge a Dependabot PR if it updates a vulnerable dependency. | 25% | 2.49 |\n| (RQ2) Dependabot often opens more PRs than I can handle. | 25% | 2.73 |\n| (RQ3) Compatibility scores are often available in Dependabot PRs. | 50% | 2.95 |\n| (RQ3) If a compatibility score is available, it is effective in indicating whether the update will break my code. | 50% | 2.95 |\n| (RQ4) Dependabot can be configured to fit the needs of my project. | 50% | 3.54 |\n| (RQ4) I configure Dependabot to make it less noisy (i.e., only update certain dependencies, scan less frequently, etc.) | 50% | 3.27 |\n\nMultiple Choice Questions\n\n| (RQ5) Are your GitHub repositories still using Dependabot for automating version updates? | 50% | 0.89 |\n| (RQ5) If not, why? | 50% | (Results in \u00a7 5.5) |\n\nOpen-Ended Questions\u2217\n\n| (RQ5) Regardless of current availability, what are the features you want most for a bot that updates dependencies? Do you have any further opinions or suggestions? | 50% | (Results in \u00a7 5.5) |\n\n\u2217 Where appropriate, we also use evidence from open-ended question responses to support the results in RQ1 - RQ4.\n\n4. The survey has been approved by the Ethics Committee of Key Laboratory of High Confidence Software Technology, Ministry of Education (Peking University) under Grant No. CS20220011.\n\nFor each candidate, we send personalized emails to them (with information about how they used Dependabot), to avoid being perceived as spam. We try our best to follow common survey ethics [71], e.g., clearly introducing the purpose of this survey, being transparent about what we will do to their responses, etc. To increase the chance of getting a response and to contribute back to the open-source community, we offer to donate $5 to an open-source project of the respondents\u2019 choice if they opt in. Therefore, we believe we have done minimal harm to the open-source developers we have contacted, and the results we get about Dependabot far outweigh the harm. In fact, we get several highly welcoming responses from the survey participants, such as: 1) keep up the good work! 2) If you would like to consult more, just ping me on <email>...Cheers!\n\nThe bottom half of Table 1 summarizes the demographics of the 131 survey respondents, showing that they are highly experienced with both Dependabot (a median of 410 interactions) and open source development (five to 15 years of experience, hundreds of commits, and many followers).\n\n5 METHODS AND RESULTS\n\n5.1 RQ1: Technical Lag\n\n5.1.1 Repository Analysis Methods\n\nWe evaluate the effectiveness of Dependabot version updates by comparing the project technical lag at two time points: the day of Dependabot adoption ($T_0$) and 90 days after adoption (i.e., $T_0 + 90$). We choose 90 days as the interval to avoid the influence of deprecations as more than 85% of them happen 90 days after adoption. Since technical lag naturally increases over time [16], [18], we include an additional time point for comparison: 90 days before adoption (i.e., $T_0 - 90$).\n\nFor a project $p$ at time $t \\in \\{T_0 - 90, T_0, T_0 + 90\\}$, we denote all its direct dependencies as $\\text{deps}(p, t)$ and define the technical lag of project $p$ at time $t$ as:\n\n$$\\text{techlag}(p, t) = \\frac{\\sum_{d \\in \\text{deps}(p, t)} \\text{mean}(0, t_{\\text{latest}}(d) - t_{\\text{adopted}}(d))}{|\\text{deps}(p, t)|}$$\nHere \\( t_{\\text{latest}}(d) \\) denotes the release time of \\( d \\)'s latest version at time \\( t \\) and \\( t_{\\text{adopted}}(d) \\) denotes the release time of \\( d \\)'s adopted version. We use \\( \\max \\) to guard against the occasional case of \\( t_{\\text{latest}}(d) < t_{\\text{adopted}}(d) \\) (e.g., developers may continue to release 0.9.x versions after the release of 1.0.0).\n\nThis technical lag definition is inspired by Zerouali et al. [18] but with several adjustments. First, we use only their time-based variant instead of their version-based variant because cross-project comparisons would not be intuitive using the latter. Second, we use the mean value of all dependencies instead of maximum or median as the overall technical lag, because we intend to measure the overall effectiveness of Dependabot for both keeping most dependencies up-to-date and eliminating the most outdated ones.\n\nWe exclude projects with an age of fewer than 90 days at Dependabot adoption and projects that deprecate Dependabot within 90 days. We also exclude projects that migrate from Dependabot Preview since they may introduce bias into results. Since the computation of technical lag based on dependency specification files and version numbers requires non-trivial implementation work for each package ecosystem, we limit our analysis on JavaScript/npm, the most popular ecosystem in our dataset. We further exclude projects with no eligible npm dependencies configured for Dependabot in \\( T_0 - 90 \\), \\( T_0 \\), or \\( T_0 + 90 \\). After all the filtering, we retain 613 projects for answering RQ1.\n\nWe adopt the Regression Discontinuity Design (RDD) framework to estimate the impact of adopting Dependabot on project technical lags. RDD uses the level of discontinuity before/after an intervention to measure its effect size while taking the influence of an overall background trend into consideration. Given that technical lag tends to be naturally increasing over time [16], [17], [18], RDD is a more appropriate statistic modeling approach for our case compared with hypothesis testing approaches (e.g., one-side Wilcoxon rank-sum tests). Following previous SE works that utilized RDD [72], [73], we use sharp RDD, i.e., segmented regression analysis of interrupted time series data. We treat project-level technical lag as a time series function, compute the technical lag for each project every 15 days from \\( T_0 - 90 \\) to \\( T_0 + 90 \\), use ordinary least square regression to fit the RDD model, and watch for the presence of discontinuity at Dependabot adoption, formalized as the following model:\n\n\\[\ny_i = \\alpha + \\beta \\cdot \\text{time}_i + \\gamma \\cdot \\text{intervention}_i + \\theta \\cdot \\text{time}_{\\text{after intervention}}_i + \\sigma_i\n\\]\n\nHere \\( y_i \\) denotes the output variable (i.e., technical lag for each project in our case); \\( \\text{time} \\) stands for the number of days from \\( T_0 - 90 \\); \\( \\text{intervention} \\) binarizes the presence of Dependabot (0 before adopting Dependabot, 1 after adoption); \\( \\text{time}_{\\text{after intervention}} \\) counts the number of days from \\( T_0 \\) (0 when \\( T_0 - 90 \\leq \\text{time} < T_0 \\)).\n\n### 5.1.2 Repository Analysis Results\n\nWe present technical lags and their delta between time points in Table 3. We plot diagrams in Figure 3 to reflect how different projects increase/decrease their technical lag from \\( T_0 - 90 \\) to \\( T_0 + 90 \\). The first surprising fact we notice is that the technical lag of approximately one-third (216/613) of projects is already decreasing between \\( T_0 - 90 \\) and \\( T_0 \\), even if technical lag tends to increase over time [16], [18]. This indicates that these projects are already taking a proactive dependency update strategy even before adopting Dependabot. On the other hand, for about half (303/613) of the projects, the technical lag increases prior to Dependabot adoption, and 94 projects keep the technical lag unchanged. For all projects, the mean and median technical lag at \\( T_0 - 90 \\) is 73.68 and 16.27 days, respectively; they decrease at \\( T_0 \\) to 48.99 and 13.96 days, respectively; at \\( T_0 + 90 \\), 159 (25.9%) of the 613 projects have already achieved a zero technical lag.\n\nBetween \\( T_0 \\) and \\( T_0 + 90 \\), projects lower their technical lag even further from a mean of 48.99 days and a median of 13.96 days to a mean of 25.38 days and a median of 3.62 days. Among the 303 projects with an increasing technical lag between \\( T_0 - 90 \\) and \\( T_0 \\), about two-thirds (220) of them see a decrease after adopting Dependabot; among the 216 projects with decreasing technical lag, nearly half (94) of them see a decrease. More than one-third (219, 35.7%) of projects achieve completely zero technical lag 90 days after Dependabot adoption. Although there are still some increases, the magnitude is much smaller (e.g., 75% quantile of only +1.75 days between \\( T_0 \\) and \\( T_0 + 90 \\) compared with 75% quantile of +14.37 days between \\( T_0 - 90 \\) and \\( T_0 \\)).\n\nTable 4 shows that the regression variable \\( \\text{intervention} \\) has a statistically significant negative coefficient (\\( \\text{coef.} = -31.2137, p < 0.001 \\)), indicating the adoption of Dependabot might have reduced technical lag and kept dependencies up-to-date in the sampled 613 projects. A more straightforward look at this trend can be observed in Figure 4: at \\( T_0 \\), project-level technical lag has a noticeable decrease, and there is a discontinuity between the liner-fitted technical lag before/after adoption. \\( \\text{time} \\) and \\( \\text{time}_{\\text{after intervention}} \\) have negative coefficients, echoing with our earlier findings:\n\n### Table 3: Technical Lag (days) for 613 npm Projects\n\n| Metric | Mean | Median | Distribution |\n|--------|------|--------|--------------|\n| \\( \\text{techlag}(p, T_0 - 90) \\) | 73.68 | 16.27 | |\n| \\( \\Delta \\text{in Between} \\) | -24.96 | 0.00 | |\n| \\( \\text{techlag}(p, T_0) \\) | 48.99 | 13.96 | |\n| \\( \\Delta \\text{in Between} \\) | -23.61 | -0.61 | |\n| \\( \\text{techlag}(p, T_0 + 90) \\) | 25.38 | 3.62 | |\n\n\n\n### Table 4: The Estimated Coefficients and Significance Levels for the RDD Model We Fit (Section 5.1.1).\n\n| Feature | Coef. | Std. Err. | \\( t \\) | \\( p \\) |\n|---------|-------|-----------|------|------|\n| Intercept* | 66.5209 | 4.595 | 14.477 | 0.000 |\n| \\( \\text{intervention} \\) | -31.2137 | 5.694 | -5.306 | 0.000 |\n| \\( \\text{time} \\) | -0.0743 | 0.079 | -0.945 | 0.345 |\n| \\( \\text{time}_{\\text{after intervention}} \\) | -0.1011 | 0.100 | -1.008 | 0.314 |\n\n* \\( p < 0.001 \\)\nthe technical lag of sampled projects is already on decrease before Dependabot adoption and the introduction of Dependabot adds up to this decreasing trend. However, both of the coefficients are not comparable to that of intervention and are not statistically significant ($p > 0.3$).\n\n### 5.1.3 Triangulation from Survey\n\nMost developers agree that Dependabot is helpful in keeping their project dependencies up-to-date: 55.8% responded with **Strongly Agree** and 35.7% with **Agree** (Table 2). As noted by one developer: **Dependabot does a great job of keeping my repositories current.** This is because Dependabot serves well as an automated notification mechanism that tells them about the presence of new versions and pushes them to update their dependencies. As mentioned by two developers: 1) **Dependabot is a wonderful way for me to learn about major/minor updates to libraries.** 2) **Dependabot can be a bit noisy, but it makes me aware of my dependencies.**\n\nHowever, some of the developers do not favor using Dependabot for automating dependency updates but only use Dependabot as a way of notification. For example: 1) **I just use it for notifications about updates, but do them manually and check if anything broke in the process.** 2) **I am just using Dependabot to tell me if there is something to update and then update all in a single shot with plain package managers.**\n\nThis indicates that they do not trust the reliability of Dependabot for automating updates and they do not think the current design of Dependabot can help them reduce the manual workload of updates. As an example, one developer states that: **Dependency management is currently much easier just utilizing yarn/npm. We use Dependabot merely because it has been recommended, but updating dependencies was faster when I solely used the command line.**\n\nOne developer suggests that using Dependabot only for update notifications has become such a common use case that they would prefer a dedicated, less noisy tool solely designed for this purpose: **It (Dependabot) becomes more like an update notification, i.e. I\u2019m leveraging only half of its capability. Could there be something designed solely for this purpose? Less invasive, more informative, and instead of creating a PR for every package\u2019s update, I would like to see a panel-style hub to collect all the information for me to get a better overview in one place.**\n\n---\n\n**Findings for RQ1:**\n\n90 days after adopting Dependabot, projects decrease their technical lag from an average of 48.99 days to an average of 25.38 days. 35.7% of projects achieve zero technical lag 90 days after adoption. The adoption of Dependabot is a statistically significant intervention as indicated by RDD. Developers agree on its effectiveness in notifying updates, but question its effectiveness in automating updates.\n\n### 5.2 RQ2: Developers\u2019 Response to Pull Requests\n\n#### 5.2.1 Repository Analysis Methods\n\nInspired by prior works [7], [54], we use the following metrics to measure the receptiveness (i.e., how active developers merge) and responsiveness (i.e., how active developers respond) of Dependabot PRs:\n\n- **Merge Rate**: The proportion of merged PRs.\n- **Merge Lag**: The time it takes for a PR to be merged.\n\n#### 5.2.2 Repository Analysis Results\n\nTable 5 shows the PR statistics we obtain for each group. The high merge rates (>70%) indicate the projects are highly\n\n\n\n**TABLE 5: PR Statistics in Different Groups.** All lags are measured in days. $\\bar{x}$ represents the mean and $\\mu$ represents the median over all PRs in this group.\n\n| Statistics | regular | sec/conf | sec/nconf |\n|----------------|---------|----------|-----------|\n| # of PRs | 502,752 | 13,406 | 23,907 |\n| Merge Rate | 70.13% | 73.71% | 76.01% |\n| Merge Lag | $\\bar{x}=1.76$, $\\mu=0.18$ | $\\bar{x}=3.45$, $\\mu=0.18$ | $\\bar{x}=8.15$, $\\mu=0.76$ |\n| Close Lag | $\\bar{x}=8.63$, $\\mu=3.00$ | $\\bar{x}=14.42$, $\\mu=5.00$ | $\\bar{x}=26.83$, $\\mu=5.71$ |\n| Resp. Lag | $\\bar{x}=2.27$, $\\mu=0.17$ | $\\bar{x}=3.74$, $\\mu=0.17$ | $\\bar{x}=8.59$, $\\mu=0.51$ |\n\n- **Close Lag**: The time it takes for a PR to be closed (i.e., not merged into the project code base).\n- **Response Lag**: The time it takes for a PR to have human interactions, including any observable action in the PR\u2019s timeline, e.g., adding a label or assigning a reviewer.\n\nThe merge rate is intended to measure receptiveness and the latter three are intended to measure responsiveness.\n\nWe assume that results may differ for PRs in different groups. We expect that 1) developers are both more receptive and more responsive to security updates due to their higher priority of eliminating security vulnerabilities; and 2) projects that use Dependabot version update (i.e., contain `dependabot.yml`) are more responsive to Dependabot PRs. To verify our expectations, we divide PRs into three groups:\n\n- **regular**: Dependabot PRs that update a package to its latest version when the old version does not contain any known security vulnerabilities.\n- **sec/conf**: Security PRs that update a package with vulnerabilities to its patched version and are opened when the project has a `dependabot.yml` file in its repository (i.e., using Dependabot version update).\n- **sec/nconf**: Security PRs opened when the project does not have a `dependabot.yml` file in its repository. These PRs are opened either before the adoption or after the deprecation of Dependabot version update.\n\nWe examine the significance of inter-group metric differences with unpaired Mann-Whitney tests and Cliff\u2019s delta ($\\delta$). Following Romano et al. [74], we consider the effect size as negligible for $|\\delta| \\in [0, 0.147]$, small for $|\\delta| \\in [0.147, 0.33]$, medium for $|\\delta| \\in [0.33, 0.474]$, and large otherwise.\nreceptive to Dependabot PRs regardless of whether they are security-related. They are more receptive to security PRs: their merge rate is 74.53%, even higher than 65.42% reported on Dependabot Preview security updates [54]. This may be because projects welcome security updates even more, or just because the projects we selected are such.\n\nAlfadel et al. [54] find that Dependabot security PRs take longer to close than to merge. Our data illustrate a similar story: regular Dependabot PRs take a median of 0.18 days (\u2248 four hours) to merge and a median of 3.00 days to close. The difference is statistically significant with a large effect size ($p < 0.001$, $\\delta = 0.91$).\n\nThe response lag, however, does not differ much from the merge lag in all groups, which confirms the timeliness of developers\u2019 response towards Dependabot PRs. We observe human activities in 360,126 (72.2%) Dependabot PRs, among which 280,276 (77.8%) take less than one day to respond. However, this also indicates an inconsistency between fast responses and slow closes. As a glance at what caused this inconsistency, we sample ten closed PRs with developers\u2019 activities before closing and inspect their event history. We find 9 out of 10 PRs are closed by Dependabot itself, for the PR being obsolete due to the release of a newer version or a manual upgrade (similar to the observation by Alfadel et al. [54]). Activities are development-related (e.g., starting a discussion, assigning reviewers) in 5 PRs, while the rest are interactions with Dependabot (e.g., @dependabot rebase).\n\nSurprisingly, security PRs require a longer time to merge ($p < 0.001$, $\\delta = 0.87$), close ($p < 0.001$, $\\delta = 0.72$), and respond ($p < 0.001$, $\\delta = 0.87$) with large effect sizes, regardless of whether the project is using Dependabot version update. Though Dependabot version update users do process security updates quicker (at least merge lag and response lag are noticeably shorter), this difference is not significant with negligible or small effect sizes ($\\delta \\leq 0.23$).\n\n### 5.2.3 Triangulation from Survey\n\nIn general, developers agree that Dependabot PRs do not require much work to review and merge (34.1% Strongly Agree, 40.3% Agree, 14.0% Neutral).\n\nWe find that they follow two different patterns of using Dependabot. One pattern is to rapidly merge the PR if the tests pass and manually perform the update by hand otherwise (65.2% Strongly Agree, 19.7% Agree, 9.1% Neutral). In the latter case, they will respond to the Dependabot PR slower, or let Dependabot automatically close the PR after the manual update (36.4% Strongly Agree, 26.5% Agree, 20.5% Neutral). For example: I almost never have to look at Dependabot PRs because I have tests, and 99.99% of PRs are merged automatically. Rarely (when dependency changes API for example) I have to manually add some fixes/updates... As mentioned in Section 5.1.3, another pattern is to use Dependabot PRs solely as a way of notification and always perform manual updates. Both cases contribute to the much larger close lag we observe in Dependabot PRs.\n\nIn terms of security updates, most developers do handle security PRs with a higher priority (56.7% Strongly Agree, 16.3% Agree, 14.0% Neutral), but they do not think security PRs require more work to review and merge (19.4% Totally Disagree, 36.4% Disagree, 26.4% Neutral). One possible explanation for the slower response, merge, and close of security PRs is that developers consider some security vulnerabilities as irrelevant to them: I want it (Dependabot) to ignore security vulnerabilities in development dependencies that don\u2019t actually get used in production.\n\nDevelopers have a mixed opinion on whether Dependabot opens more PRs than they can handle (15.9% Strongly Agree, 15.2% Agree, 22.0% Neutral, 20.5% Disagree, 26.5% Totally Disagree). Whether the PR workload introduced by Dependabot is acceptable may depend on other factors (e.g., the number of dependencies and how fast packages evolve), as indicated by two respondents: 1) The performance of Dependabot or other similar bots could depend on the number of dependencies a project has. For smaller projects, with a handful of dependencies, Dependabot will be less noisy and usually safe as compared to large projects with a lot of dependencies. 2) The utility of something like Dependabot depends heavily on the stack and number of dependencies you have. JS is much more noisy than Ruby, for example, because Ruby moves more slowly.\n\n### Findings for RQ2:\n\n>70% of Dependabot PRs are merged with a median merge lag of four hours. Compared with regular PRs, developers are less responsive (more time to respond, close or merge) but more receptive (higher merge rate) to security PRs. Developers tend to rapidly merge PRs they consider \u201csafe\u201d and perform manual updates for the remaining PRs.\n\n### 5.3 RQ3: Compatibility Score\n\n#### 5.3.1 Repository Analysis Methods\n\nWe explore the effectiveness of compatibility scores in two aspects: Availability, and Correlation with Merge Rate.\n\n1) Availability: We begin our analysis by understanding the data availability of compatibility scores, for they would not take effect if they are absent from most of the PRs. For this purpose, we obtain compatibility scores from badges in PR bodies, which point to URLs defined per dependency version pair. That is, Dependabot computes one compatibility score for each dependency version pair $\\langle d, v_1, v_2 \\rangle$ and show the score to all PRs that update dependency $d$ from $v_1$ to $v_2$. In case this computation fails, Dependabot generates an unknown compatibility score for $\\langle d, v_1, v_2 \\rangle$.\n\nSince compatibility scores are computed in a data-driven manner, we wonder if the popularity of the updated dependencies affects their availability. As a quick evaluation, we sample 20 npm dependencies with more than one million downloads per week as representatives for popular dependencies. Next, we retrieve the release history of these dependencies by querying the npm registry API, retaining only releases that came available after January 1, 2020 (recall that all Dependabot PRs in our dataset are created after January 2020, Section 4). For the releases in each dependency, we get all possible dependency version pairs from a Cartesian product (1,629 in total) and query their compatibility scores from corresponding Dependabot URLs.\n\n2) Correlation with Merge Rate: In theory, if developers perceive compatibility scores as reliable, PRs with higher compatibility scores will be more likely to get merged. To quantitatively evaluate this, we compare merge rates for PRs with different compatibility scores. Since PRs that update the same version pair share the same score, we further\nFig. 5: Distribution of compatibility scores and available CI test results over the version pairs of axios.\n\n(a) Compatibility Score (b) # of CI Test Results\n\nTABLE 6: Compatibility Score and PR Merge Rate\n\n| Compatibility Score | # of PRs | Merge Rate |\n|---------------------|---------|------------|\n| unknown | 485,501 | 69.96% |\n| < 80% | 1,321 | 30.20% |\n| < 90%, >= 80% | 1,605 | 67.48% |\n| < 95%, >= 90% | 1,794 | 73.19% |\n| < 100%, >= 95% | 2,228 | 84.43% |\n| == 100% | 10,303 | 80.30% |\n\nutilize Spearman\u2019s $\\rho$ to measure the correlation between a) compatibility score for a dependency version pair $(d, v_1, v_2)$, and b) merge rate for all PRs that update $d$ from $v_1$ to $v_2$.\n\nAs we will show in Section 5.3.2, compatibility scores are abnormally scarce. Although we have reached Dependabot maintainers for explanations, they claim such information to be confidential and refuse to share any details. We compute the number of CI test results for each dependency version pair and analyze their overall distribution to provide possible explanations for such scarcity.\n\n5.3.2 Repository Analysis Results\n\n1) Availability: Compatibility scores are extremely scarce: Only 3.4% of the PRs and 0.5% of the dependency version pairs have a compatibility score other than unknown. Merely 0.18% of the dependency version pairs have a value other than 100%. Its scarcity does not become better even among the most popular npm dependencies: 1,604 (98.5%) of the 1,629 dependency version pairs we sample only have a compatibility score of unknown, 10 (0.6%) have a compatibility score of 100%, and 15 (0.9%) have a compatibility score less than 100%. As an example, we plot a compatibility score matrix for axios, which has the most (15) version pairs with compatibility scores, in Figure 5a.\n\n2) Correlation with Merge Rate: We summarize the merge rates for PRs with different compatibility scores in Table 6. We can observe that for PRs with a compatibility score, a high score indeed increases their chance of being merged: if the score is higher than 90%, developers are more likely to merge the PR. By contrast, if the score is lower than 80%, developers become very unlikely (30.20%) to merge. The Spearman\u2019s $\\rho$ between compatibility score and merge rate is 0.37 ($p < 0.001$), indicating a weak correlation according to Prion and Haerling\u2019s interpretation [75].\n\nFigure 6 shows the number of dependency version pairs with more than $x$ CI test results. We can observe an extreme Pareto-like distribution: for the 167,053 dependency version pairs in our dataset, less than 1,000 have more than 50 CI test results and less than 100 have more than 150 CI test results. For the case of axios (Figure 5b), the compatibility scores are indeed only available for version pairs with available CI test results. It is hard to explain why the scores are missing even for some version pairs with many CI test results (e.g., the update from 0.19.2 to 0.20.0), as we do not know the underlying implementation details.\n\n5.3.3 Triangulation from Survey\n\nDevelopers have diverging opinions on whether compatibility scores are available (7% Strongly Agree, 24.8% Agree, 38.8% Neutral, 17.8% Disagree, 11.6% Totally Disagree) and whether compatibility scores are effective if they are available (4.7% Strongly Agree, 21.7% Agree, 45.7% Neutral, 19.4% Disagree, and 8.5% Totally Disagree). The answer distributions and the high number of Neutral responses likely indicate that many developers do not know how to rate the two statements [76], because compatibility scores are too scarce and most developers have not been exposed to this feature. As replied by one developer: Compatibility scores and vulnerable dependencies detection are great, I use Dependabot a lot but was not aware they exist...(They) should be more visible to the user. Another developer does express concerns that compatibility scores are not effective, saying that Dependabot\u2019s compatibility score has never worked for me.\n\nFurther, several developers (6 responses in our survey) hold the belief that Dependabot only works well in projects with a high-quality test suite. For example:\n\n1) Dependabot works best with a high test coverage and if it fails people it\u2019s likely because they have too little test coverage.\n2) Dependabot without a good test suite is indeed likely too noisy, but with good tests and an understanding of the code base it is trivial to know whether an update is safe to update or not.\n\nFindings for RQ3:\n\nCompatibility scores are too scarce to be effective: only 3.4% of PRs have a known compatibility score. For those PRs with one, the scores have a weak correlation ($\\rho = 0.37$) with the PR merge rate. Its scarcity may be because most dependency version pairs do not have sufficient CI test results (i.e., a Pareto-like distribution) for inferring update compatibility. As a result, developers think Dependabot only works well in projects with high-quality test suites.\n5.4 RQ4: Configuration\n\n5.4.1 Repository Analysis Methods\n\nDependabot offers tons of configuration options for integration with project workflows, such as who to review, how to write commit messages, how to label, etc. In this research question, we only focus on the options related to notifications because we expect them to be possible countermeasures against noise and notification fatigue. More specifically, we investigate the following options provided by Dependabot:\n\n1) schedule.interval: This option is mandatory and specifies how often Dependabot scans project dependencies, checks for new versions, and opens update PRs. Possible values include \"daily\", \"weekly\", and \"monthly\".\n\n2) open-pull-requests-limit: It specifies the maximum number of simultaneously open Dependabot PRs allowed in a project. The default value is five.\n\n3) allow: It tells Dependabot to only update a subset of dependencies. By default, all dependencies are updated.\n\n4) ignore: It tells Dependabot to ignore a subset of dependencies. By default, no dependency is ignored.\n\nThe latter two options are very flexible and may contain constraints exclusive to some package ecosystems, e.g., allowing updates in production manifests or ignoring patch updates according to the semantic versioning convention [31].\n\nTo understand developers\u2019 current practice of configuring Dependabot, we parse 3,921 Dependabot configurations from 1,588 projects with a dependabot.yml in their current working tree. For schedule.interval and open-pull-requests-limit, we count the frequency of each value. For allow and ignore, we parse different options and group them into three distinctive strategies:\n\n1) default: allowing Dependabot to update all dependencies, which is its default behavior;\n\n2) ignorelist: configuring Dependabot to ignore a subset of dependencies;\n\n3) allowlist: configuring Dependabot to only update on a subset of dependencies.\n\nWe further explore the modification history of Dependabot configurations to observe how developers use configuration as a countermeasure against noise in the wild. For this purpose, we find all commits in the 1,823 projects that modified dependabot.yml and extract eight types of configuration changes from file diffs:\n\n1) +interval: Developers increase schedule.interval.\n\n2) -interval: Developers decrease schedule.interval.\n\n3) +limit: Developers increase open-pull-requests-limit.\n\n4) -limit: Developers decrease open-pull-requests-limit.\n\n5) +allow: Developers allow some more dependencies to be automatically updated by Dependabot.\n\n6) -allow: Developers no longer allow some dependencies to be automatically updated by Dependabot.\n\n7) +ignore: Developers configure Dependabot to ignore some dependencies for automated update.\n\n8) -ignore: Developers configure Dependabot to no longer ignore some dependencies for automated update.\n\n5. Note that 235 of the 1,823 projects do not have dependabot.yml in their current working tree which we will investigate in RQ5. One project may depend on more than one package ecosystem (e.g., both npm and PyPI) and have separate configurations for each of them.\n\nFinally, we analyze configuration modifications by time since Dependabot adoption. We mainly focus on the bursts of modification patterns, because bursts illustrate the lag from the developers\u2019 perception of noise to their countermeasures to mitigate the noise.\n\n5.4.2 Repository Analysis Results\n\nThe current configurations of Dependabot show that most projects configure Dependabot toward a proactive update strategy: 2,203 (56.2%) of schedule.interval are \"daily\" while merely 276 (7.04%) of them are a conservative \"monthly\". 1,404 (35.8%) of the open-pull-requests-limit configurations are higher above the default value while only a negligible proportion (2.3%) is lower. For allow and ignore options, most of the configurations (3,396, 86.7%) adopt the default strategy, less (380, 9.7%) use ignorelist, and a small proportion (50, 1.3%) use allowlist.\n\nThe modifications tell us another story. 776 (42.57%) of the 1,823 projects in our dataset have modified the Dependabot configuration options we study (e.g., update interval) and they contain 2.18 modification commits on average (median = 1.00). Figure 7 illustrates the proportion of each modification type, which shows that projects increase schedule.interval and lower open-pull-requests-limit more frequently than doing the opposite. As demonstrated in Figure 8, projects can increase schedule.interval any time after Dependabot adoption but more likely to reduce open-pull-requests-limit only after several months of Dependabot usage. schedule.interval determines how often Dependabot bothers developers to a large extent, and we are seeing developers of 336 projects increasing it in 868 configurations. We further confirm this behavior as a countermeasure against noise from a real-life example where developers reduce the frequency to monthly to reduce noise [77]. open-pull-requests-limit quantifies the devel-\noperators\u2019 workload on each interaction, which is also noise-related as indicated by a developers\u2019 complaint: Dependabot PRs quickly get out of hand [78]. If we focus on modifications that happen 90 days after Dependabot adoption, we find nearly two-thirds (62.5%) of open-pull-requests-limit changes belong to -limit. Our observations indicate the following phenomenon. At the beginning of adoption, developers configure Dependabot to interact frequently and update proactively. However, they later get overwhelmed and suffer from notification fatigue, which causes them to reduce interaction with Dependabot or even deprecate Dependabot (RQ5). As an extreme case, one developer forces Dependabot to open only 1 PR at a time to reduce noise [79].\n\nIgnoring certain dependencies seems to be another noise countermeasure, for developers tend to add an ignored dependency more often than remove one (Figure 7). For example, a commit says update ignored packages...so they are never automatically updated to stop noise [80]. However, we also observe cases where developers add ignored dependencies due to other intentions, such as handling breaking changes [81] and preserving backward compatibility [82]. For +allow and -allow, we observe an interesting burst of -allow (Figure 8c) earlier but more +allow dependencies later, but we do not find any evidence explaining such trend.\n\n5.4.3 Triangulation from Survey\n\nAlthough more than half of respondents think Dependabot can be configured to fit their needs (25.6% Strongly Agree and 30.2% Agree), some do not (7.8% Totally Disagree and 14% Disagree). As a peek into this controversy, one developer says, I think people that complain about how noisy it is (I\u2019ve seen a lot of this) just don\u2019t configure things correctly.\n\nMore than half (50.4%) of respondents have configured Dependabot to make it less noisy, but roughly one-third (32.6%) have not (21.2% Strongly Agree, 29.5% Agree, 16.7% Neutral, 20.5% Disagree, 12.1% Totally Disagree). It is possible that the default configurations of Dependabot only work for projects with a limited number of dependencies and these dependencies are not fast-evolving (see Section 5.2.3); for other projects, developers need to tweak the configurations multiple times to find a sweet spot for their projects. However, many respondents eventually find that Dependabot does not offer the options they want for noise reduction, such as update grouping and auto-merge. We will investigate this in-depth in RQ5.\n\nFindings for RQ4:\nThe majority of Dependabot configurations imply a proactive update strategy, but we observe multiple patterns of noise avoidance from configuration modifications, such as increasing schedule intervals, lowering the maximum number of open PRs, and ignoring certain dependencies.\n\n5.5 RQ5: Deprecations & Desired Features\n\n5.5.1 Repository Analysis Methods\n\nTo locate projects that may have deprecated Dependabot, we find projects with no dependabot.yml in their current working trees, resulting in 235 projects. For each of them, we identify the last commit that removes dependabot.yml, inspect their commit messages, and identify any referenced issues/PRs following the GitHub convention. If the dependabot.yml removal turns out to be due to a project restructure or stop of maintenance, we consider it as a false positive and exclude it from further analysis.\n\nFor the remaining 206 projects, we analyze reasons for deprecation from commit messages and issue/PR text (i.e., titles, bodies, and comments). Since a large proportion of text in commit messages, issues, and PRs are irrelevant to Dependabot deprecation reasons, two authors read and re-read all text in the corpus, retaining only the relevant. They encode reasons from text and discuss them until reaching a consensus. They do not conduct independent coding and measure inter-rater agreement because the corpus is very small (only 27 deprecations contain documented reasons).\n\nFor each of the confirmed deprecations, we check bot configuration files and commit/PR history to find possible migrations. We consider a project as having migrated to another dependency management bot (or other automation approaches) if it meets any of the following criteria:\n\n1) developers have specified the migration target in the commit message or issue/PR text;\n2) dependabot.yml is deleted by another dependency management bot (e.g., Renovate Bot automatically deletes dependabot.yml in its setup PR);\n3) the project adopts another dependency management bot within 30 days before or after Dependabot deprecation.\n\nTo obtain the developers\u2019 desired features for a dependency management bot, we ask two optional open-ended questions at the end of the survey (Table 2). The two questions are answered by 97 and 46 developers, respectively. To identify recurring patterns from the answers, two authors of this paper (both with >6 years of software development experience and familiar with using Dependabot) conduct open coding [83] on the responses to generate an initial set of codes. They read and re-read all answers to familiarize themselves with and gain an initial understanding of them. Then, one author assigns text in answers to some initial codes that reflects common features in dependency management bots and discusses with the other author to iteratively refine the codes until a consensus is reached. They further conduct independent coding on the answers using the refined codes and exclude answers that do not reflect anything related to this RQ. As each response may contain multiple codes, we use MASI distance [84] to measure the distance between two raters\u2019 codes and Krippendorff\u2019s alpha [85] to measure inter-rater reliability. The Krippendorff\u2019s alpha we obtain is 0.865, which satisfies the recommended threshold of 0.8 and indicates a high reliability [85].\n\n5.5.2 Repository Analysis Results\n\nWe confirm 206 of the 235 candidates to be real-life Dependabot deprecations, which is substantial considering that our dataset only contains 1,823 projects. From Figure 9, we can observe that Dependabot deprecations are evenly distributed over time in general with a few fluctuations, mostly coming from organization-wide deprecations. For instance, the maximum value in December 2020 is caused by 26 Dependabot deprecations in octokit, the official GitHub API client implementation.\nWe encode nine categories of reasons from the 27 deprecations that explicitly mentioned their reasons:\n\n1) **Notification Fatigue (9 Deprecations):** Developers do recognize Dependabot\u2019s overwhelming notifications and PRs as the central issue in their experience with Dependabot. As noted by one developer: \u201cI\u2019ve been going mad with dependabot alerts which are annoying and pointless. I\u2019d rather do manual upgrades than use this\u201d [86].\n\n2) **Lack of Grouped Update Support (7 Deprecations):** By the Dependabot convention, each PR updates one dependency and one dependency only, which comes unhandy in two scenarios: a) related packages tend to follow similar release schedules, which triggers Dependabot to raise a PR storm on their updates [87]; b) in some cases, dependencies must be updated together to avoid breakages [88]. The excessive notifications and additional manual work quickly frustrate developers. For example: a) My hope was that we can better group dependency upgrades. With the default configuration, there is some grouping happening, but most dependencies would be upgraded individually [89]; b) Also, a lot of packages have to be updated together. Separate PRs for everything isn\u2019t very fun [90].\n\n3) **Package Manager Incompatibility (7 Deprecations):** Developers may have compatibility issues after the introduction of a new package manager or a newer version of the package manager. In the seven cases we have found, five concern yarn v2, one concerns npm v7 (specifically lockfile v3), and one concerns pnpm. To make matters worse, Dependabot may even have undesirable behaviors, e.g., messing around with yarn lockfiles [91], when encountered with such incompatibilities. This contributes to developers\u2019 update suspicion, as merging pull requests leads to possible breakages in dependency specification files. At the time of writing, Dependabot still has no clear timeline on supporting pnpm [92] or yarn v2 [93]. For the unlucky part of Dependabot users, it means to revert [94], to patch Dependabot PRs manually or automatically [95], or to migrate to an alternative, e.g., Renovate Bot [96].\n\n4) **Lack of Configurability (5 Deprecations):** Dependabot is also deprecated due to developers\u2019 struggle to tailor a suitable configuration. For example: a) it appears that we\u2019re not able to configure Dependabot to only give us major/minor upgrades [97]; b) Dependabot would require too much configuration long-term \u2013 too easy to forget to add a new package directory [98]. Developers mention that other dependency management bots can provide more fine-grained configuration options such as update scope and schedule: (Renovate Bot) has a load more options we could tweak too compared to Dependabot if we want to reduce the frequency further [99].\n\n5) **Absence of Auto-Merge (3 Deprecations):** Alfadel et al. [54] illustrate that auto-merge features are tightly associated with rapid PR merges. However, GitHub refused to offer this feature in Dependabot [100], claiming that auto-merge allows malicious dependencies to propagate beyond the supervision of project maintainers. This may render Dependabot impractical, as claimed by a developer: (the absence of auto-merge) creates clutter and possibly high maintenance load.\n\nWe notice a non-negligible proportion (8.17%) of pull requests are merged by third-party auto-merge implementations (e.g., a CI workflow or a GitHub App). Unfortunately, they may become dysfunctional on public repositories after GitHub enforced a change on Dependabot PR triggered workflows [101]. This turns out to be the last straw for several Dependabot deprecations. As a developer states, they dropped Dependabot because latest changes enforced by GitHub prevent using the action in Dependabot\u2019s PR\u2019s context.\n\n6) **High CI Usage (3 Deprecations):** Maintainers from 3 projects complain that Dependabot\u2019s substantial, auto-rebasings PRs have devoured their CI credits. In their words, Dependabot\u2019s CI usage is what killed us with Dependabot, and a waste of money and carbon.\n\nOther reasons for Dependabot deprecation include: 7) **Dependabot Bugs (2 Deprecations),** 8) **Unsatisfying of Branch Support (1 Deprecation),** and 9) **Inability to Modify Custom Files (1 Deprecation).**\n\nThe deprecation of Dependabot does not necessarily mean developers\u2019 loss of faith in automating dependency updates. Actually, over two-thirds (68.4%, 141/206) of the projects turn to another bot or set up custom CI workflows to support their dependency updates. Among them, Renovate Bot (122) is the most popular migration target, followed by projen (15), npm-check-updates (2) and depfu (1).\n\n### 5.5.3 Triangulation from Survey\n\nAmong the 131 surveyed developers, 14 (10.7%) tell us they have deprecated Dependabot in their projects. Most of the reasons they provide fall within our analysis and the frequency distribution is highly similar. There are two exceptions: one deprecates because Dependabot frequently breaks code and one deprecates because their entire project has been stalled. Developers also respond in our survey that they think automated dependency management is important and beneficial for their projects but the limitations of Dependabot causes them to do the deprecation. For example: Dependabot could be great, it just needs a few fixes here and there. It\u2019s unclear why Dependabot hasn\u2019t been polished. They also reply to us that Renovate Bot does provide some features that they need (e.g., grouped update PRs).\n\nWe identify nine major categories of developers\u2019 desired features (each corresponds to one code) from the answers provided by 84 respondents. The remaining categories are discarded as they are only supported by one answer (which thus may be occasional and not generalizable). We will explain each category in the order of popularity.\n\n1) **Group Update PRs (29 Respondents):** This category refers to the feature of automatically grouping some dependency updates into one PR instead of opening one PR for each update. It is most frequently mentioned and developers consider this feature as an important measure for making the handling of bot PRs less tedious, repetitive, and time-consuming. They want the bot to automatically identify dependencies that should be updated together and\nmerge them into one PR update because many libraries (e.g., symfony, @typescript-eslint, babel) version all packages under a single version. They also want the bot to automatically find and merge \u201csafe\u201d updates into one PR while leaving \u201cunsafe\u201d updates as single PRs for more careful reviewing.\n\n2) Package Manager Support (20 Respondents): This category refers to the feature of supporting more package managers (and their corresponding ecosystems) or features for the bot to align with the conventions in the package manager/ecosystem. Developers have expressed their desire for the bot to support Gradle, Flatter, Poetry, Anaconda, C++, yarn v2, Clojure, Cargo, CocoaPods, Swift Package Manager (in iOS), etc., indicating that dependency management bots, if well designed and implemented, can indeed benefit a wide range of developers and software development domains. Dependabot does claim support for many package managers mentioned before but it still needs to be tailored and improved in, e.g., performance and update behaviors: a) When I have 3 open Poetry updates I can merge one and then have to wait 15 minutes for the conflicts to be resolved. b) Perhaps for node.js projects the ability to update package.json in addition to package.lock, so the dependency update is made explicit.\n\n3) Auto-Merge (19 Respondents): This category refers to the feature of automatically merging some update PRs into the repository if certain conditions are satisfied. As mentioned in Section 5.3.3, some developers believe as long as their projects have high-quality test suites, it will be trivial to review the update PR and they would prefer them to be merged automatically if the tests pass.\n\nDespite the significant demand, this feature also seems to be especially controversial because doing this means offloading trust and giving bot autonomy. Although GitHub considers it unacceptable due to security risks [100], our survey clearly indicates that many still want to do this even if they are well aware of the risks. They also think the responsibility of risk control, e.g., vetting new releases, should be given to some capable central authority, not them. Here are three response examples: a) While this might be somewhat dangerous, and should be configurable somehow, [auto-merge] is something that basically already happens when I merge such PRs. b) If I am merging with Dependabot like 60 deps a day - I don\u2019t know if some of the versions are not published by hackers who took over the repository account, so it would be great if there was some authority where humans actually check the changes and mark them secure. c) For me it\u2019d be good if I could mute all notifications about Dependabot PRs except for when tests failed, indicating that I need to manually resolve some issues. Otherwise I\u2019d be happy not to hear about it updating my deps.\n\n4) Display Release Notes (8 Respondents): This category refers to the feature of always showing some sort of release notes or change logs in update PRs to inform developers of the changes in an update. Although Dependabot sometimes can provide release notes in PRs (Figure 1), it fails for 24.8% of the PRs in our dataset. One possible reason for this is that release notes are often missing or inaccessible in open source projects [35], which is also confirmed by one of our survey respondents: Most npm package updates feel unnecessary and the maintainers very often don\u2019t bother to write meaningful release notes...At the same time, I shouldn\u2019t expect maintainers to go through all of their dependencies\u2019 changelogs either, so perhaps the tool should find those release notes for me.\n\n5) Avoid Unnecessary Updates (7 Respondents): This category refers to the feature of providing a default behavior and configuration options to avoid updates that most developers in an ecosystem perceived as unnecessary. The most frequently mentioned feature is the ability to define separate update behaviors for development and production (or runtime) dependencies. Many developers would avoid the automatic update of development dependencies because they perceive such updates as mostly noise and there is very little gain in keeping development dependencies up-to-date. Other mentioned features include the ability to detect and avoid updates on bloated dependencies and to only provide updates for dependencies with real security vulnerabilities.\n\n6) Custom Update Action (5 Respondents): This category of features refers to the ability to define custom update behaviors (using, e.g., regular expressions) to update dependencies in unconventional dependency files.\n\n7) Configurability (5 Respondents): This category refers to the case of developers expressing that dependency management bots should be highly configurable, but does not provide any further information on the specific configuration options they want, e.g., more configuration options.\n\n8) git Support (4 Respondents): This category of features concerns the integration of dependency management bots with the version control system (in our case, git). The specific mentioned features include automatic rebase, merge conflict resolution, squashing, etc., all of which help ensure that bot PRs will not incur additional work on developers (e.g., manipulating git branches and resolving conflicts).\n\n9) Breaking Change Impact Analysis (3 Respondents): This feature category refers to the ability to perform program analysis to identify breaking changes and their impact on client code, e.g., something like a list of parts of my codebase that might be impacted by the update would be useful. This could be based on a combination of changes listed in the release notes and an analysis of where the package is used in my code.\n\nThe developers\u2019 desired features align well with the reasons for Dependabot deprecation, indicating that feature availability can be an important driver for the migrations and competition between dependency management bots.\n\nFindings for RQ5:\n\n11.3% of the studied projects have deprecated Dependabot due to notification fatigue, lack of grouped update support, package manager incompatibility, lack of configurability, absence of auto-merge, etc. 68.4% of them migrate to other ways of automation, among which the most common migration target is Renovate Bot (86.5%). We identify nine categories of developers\u2019 desired features that align well with the Dependabot deprecation reasons.\n\n6 DISCUSSION\n\n6.1 The State of Dependency Management Bots\n\nIn a nutshell, our results indicate that Dependabot could be an effective solution for keeping dependency up-to-date (RQ1, RQ2), but often with significant noise and workloads (RQ1, RQ4, RQ5), many of which could not be mitigated by the features and configuration options offered by Dependabot (RQ5). Apart from that, Dependabot\u2019s compatibility score solution is hardly a success in indicating the\ncompatibility of a bot update PR (RQ3). As of March 2023, Dependabot is still under active development by GitHub with the majority of effort in supporting more ecosystems (e.g., Docker, GitHub Actions) and adding features to reduce noise (e.g., automatically terminate Dependabot in inactive repositories), according to the GitHub change log [102]. Still, there is plenty of room for improvement to tackle the update suspicion and notification fatigue problem [8].\n\nAmong other dependency management bots, Renovate Bot is an actively developed and popular alternative for Dependabot version update (RQ5), while Greenkeeper [9] has been deprecated, PyUp [5] seems to be no longer under active development, and Synk Bot [6] mainly offers security-focused solutions. As of March 2023, Renovate Bot provides more features and configuration options than Dependabot for fine-tuning notifications, including update grouping and auto-merge [103]; it also provides merge confidence badges with more information than Dependabot [104]. However, it is still unclear whether the features and strategies taken by Renovate Bot are actually effective in practice, and we believe Renovate Bot could be an important study subject for future dependency management bot studies.\n\n6.2 What Should be the Key Characteristics of a Dependency Management Bot?\n\nIn this section, we try to summarize the key characteristics of an ideal dependency management bot based on the results from our analysis and previous work. We believe they can serve as general design guidelines for practitioners to design, implement, or improve dependency management bots (or other similar automation solutions).\n\n**Configurability.** Wessel et al. [40] argue that noise is the central challenge in SE bot design and re-configuration should be the main countermeasure against noise. For the case of Dependabot, we find that Dependabot also causes noise to developers by opening more PRs than developers can handle (RQ4), and developers can re-configure multiple times to reduce its noise (RQ4). However, re-configuration is not always successful due to the lack of certain features in Dependabot, causing deprecations and migrations (RQ5). Just as many other software development activities, it is also unlikely for a \u201csilver bullet\u201d to be present, as noted by one of our survey respondents, *...there is no best practice in dependency management which is easy, fast and safe.*\n\nTherefore, we argue that configurability, i.e., offering the highest possible configuration flexibility for controlling its update behavior, should be one of the key characteristics of dependency management bots. This helps the bot to minimize unnecessary update notifications and attempts so that developers are less interrupted. Apart from the options already provided by Dependabot, our study indicates that the following configuration options should be present in dependency management bots:\n\n1) **Grouped Updates:** Dependency management bots should provide options to group multiple updates into one PR. Possible options include grouping all \u201csafe\u201d updates (e.g., not breaking the CI checks) and updates of closely related dependencies (e.g., different components from the same framework).\n\n2) **Update Strategies:** Dependency management bots should allow developers to specify which dependency to update based on more conditions, such as whether the dependency is used in production, the severity of security vulnerabilities, whether the dependency is bloated, etc.\n\n3) **Version Control System Integration:** Dependency management bots should allow developers to define how the bot should interact with the version control system, including which branch to monitor, how to manipulate branches and handle merge conflicts, etc.\n\n**Autonomy.** According to the SE bot definition by Erlenhov et al. [39], the key characteristics of an \u201cAlex\u201d type of SE bot are its ability to autonomously handle (often simple) development tasks and its central design challenges include minimizing interruption and establishing trust with developers. However, without the auto-merge feature, Dependabot is hardly autonomous and this lack of autonomy is disliked by developers (RQ5); in extreme cases, developers use Dependabot entirely as a notification tool but not as a bot (Section 5.1.3). This lack of autonomy is also causing a high level of interruption and workload to developers using Dependabot in their projects (RQ5).\n\nWe argue that autonomy, i.e., the ability to perform dependency updates autonomously without human intervention under certain conditions, should be one of the key characteristics of dependency management bots. This characteristic is only possible when the risks and consequences of dependency updates are highly transparent and developers know when to trust these updates. Within the context of GitHub, we believe the current dependency management bots should offer the configuration option to merge update PRs when the CI pipeline passes. This option can be turned on for projects that have a well-configured CI pipeline with thorough static analysis, building, and testing stages, when the developers believe that their pipeline can effectively detect incompatibilities in dependency updates (Section 5.3.3).\n\nWith respect to the security concern of *auto-merge being used to quickly propagate a malicious package across the ecosystem* [100], we argue that the responsibility of verifying new releases in terms of security should not be given to independent developers as they usually do not have the required time and expertise (RQ5). Instead, package hosting platforms (e.g., npm, Maven, PyPI) should vet new package releases and quickly take down malicious releases to minimize their impact. These practices are also advocated in the literature on software supply chain attacks [105].\n\n**Transparency.** Multiple previous studies, both on SE bots and on other kinds of software bots, point to the importance of transparency in bot design. For example, Erlenhov et al. [39] shows that developers need to establish the trust that the bot can perform correct development tasks. Similarly, Godulla et al. [106] argue that transparency is vital for bots used in corporate communications. In the context of code review bots, Peng and Ma [107] find that contributors expect the bot to be transparent about why a certain code reviewer is recommended. To reduce update suspicion [8] in dependency management bots, developers also need to know when to trust the bot to perform dependency updates.\n\nWe argue that transparency, i.e., the ability to transparently demonstrate the risks and consequences of a dependency update, should be one of the key characteristics of dependency management bots. However, the Dependabot compatibility score feature is hardly a success toward this\ndirection and developers only trust their own test suites. Beyond compatibility scores and project test suites, the following research directions may be helpful in enabling transparency in dependency management bots and establishing trust in the bot users:\n\n1) **Program Analysis**: One direction to achieve this is to leverage program analysis techniques. There have been significant research and practitioner effort on breaking change analysis [36], two of which have demonstrated the potential of using static analysis in assessing bot PR compatibility [34], [108]. Still, given the extremely large scale of bot PRs [7], more research and engineering effort is needed to implement lightweight and scalable approaches to support each popular ecosystem.\n\n2) **CI Log Analysis**: Another direction is to extend the idea of compatibility score with sophisticated techniques that learn more knowledge from CI checks. Since CI checks are scarce for many version pairs (RQ3), it will be interesting to explore techniques that transfer knowledge from other version pairs so that the matrix in Figure 5a can be less sparse. The massive CI checks available from Dependabot PRs would be a promising starting point.\n\n3) **Release Note Generation**: Dependabot sometimes fails in locating and providing a release note for the updated dependency, and even if there is one, the maintainers very often don\u2019t bother to write meaningful release notes, as noted by one respondent. This situation can be mitigated by applying approaches on software change summarization (e.g., [109]) and release note generation (e.g., [110]).\n\n**Self-Adaptability.** The ability to adapt to the specific environment and its dynamics is considered as one of the key characteristics of a \u201crational agent\u201d in artificial intelligence [111], [112]. Dependency management bots can also be considered as autonomous agents working in the artificial environment of social coding platforms (e.g., GitHub). However, our findings reveal that Dependabot often cannot operate in the ways expected by developers (RQ5) and reconfigurations are common (RQ4). Such failures (e.g., update actions, package manager incompatibility, git branching) will lead to interruption and extra work for developers.\n\nWe argue that self-adaptability, i.e., the ability to automatically identify and self-adapt to a sensible default configuration in a project\u2019s environment, should be one of the key characteristics of dependency management bots. For GitHub projects, its environment can include its major programming languages, package managers & ecosystems, the workflows used, the active timezone, developer preferences and recent activities, etc. A dependency management bot should have the ability to automatically generate a configuration file based on such information, and recommend configuration changes when the environment has changed (e.g., developer responses to bot PRs become slower than usual). This can be implemented by providing a semi-automatic recommender system for recommending an initial configuration to developers and prompting bot PRs for modifying their own configurations after bot adoption.\n\n### 6.3 Comparison with Previous Work\n\nSeveral previous studies have also made similar recommendations based on results from Greenkeeper or Dependabot [8], [37], [54], [56]. Studies on Greenkeeper [8], [37] show that dependency management bot causes noise to developers and CI test results are unreliable, but they do not investigate the effectiveness of bot configurations as a countermeasure against noise. Studies on Dependabot [54], [56] either focuses on a different aspect (i.e., security updates [54]) or provides specific recommendations on Dependabot features [56]. Compared with the previous studies, the contributions of our study are: 1) a systematic investigation of the Dependabot version update service, and 2) a comprehensive four-dimension framework for dependency management bot design.\n\nThe implications of our study are also related to the larger literature of SE bots and dependency management. With respect to the two fields, the contribution of our study is a unique lens of observation, i.e., Dependabot, that results in a set of tailored recommendations for dependency management bot design. We have carefully discussed in Section 6.2 about how the implications of our study confirm, extend, or echo the implications from existing literature.\n\n### 6.4 Threats to Validity\n\n#### 6.4.1 Internal Validity\n\nIn RQ1, we have provided a holistic analysis of the impact of Dependabot adoption without incorporating possible confounding factors (e.g., the types of dependencies and the characteristics of projects). Consequently, it is difficult for our study to establish a firm answer on the effectiveness of adopting Dependabot and future work is needed to better quantify such impact among possible confounding factors.\n\nSeveral approximations are used throughout our analysis. In RQ2, we resort to identify security PRs ourselves which may introduce hard-to-confirm errors (only repository owners know whether their PRs are security-related). The merge rate may not accurately reflect the extent to which Dependabot updates are accepted by developers as some projects may use different ways of accepting contributions. To mitigate this threat, we focus on projects that have merged at least 10 Dependabot PRs with the intuition that these projects are unlikely to accept Dependabot PRs in other ways if they have already merged many of them. In RQ3, Dependabot\u2019s compatibility scores may change over time and it is impossible to know the score at the time of PR creation. In RQ4, Dependabot supports ecosystem specific matchers in dependency specifications, e.g., @angular/*, which we do not consider when parsing configuration files. However, we believe the noise introduced above should be minor and will not invalidate our findings or hinder the reproducibility of our data analysis. Like other studies involving manual coding, our analysis of developer discussions and survey responses are vulnerable to author bias. To mitigate this, two authors double-check all results and validate findings with project commit/PR histories for RQ5; they further conduct inter-rater reliability analysis for RQ5 when the dataset becomes larger. Finally, our own interpretation of the data (RQ1 - RQ5) may also be biased towards our own judgment. To mitigate this, we triangulate our key findings using a developer survey and derive implications based on both our analysis and developers\u2019 feedback.\n6.4.2 External Validity\n\nJust like all case studies, generalizing our specific findings in each RQ to other dependency management bots and even to other projects that use Dependabot should be cautious. Our dataset only contains popular and actively maintained GitHub projects, many of which are already taking proactive updating strategies. Therefore, our findings may not generalize to projects of a smaller scale or more reluctant to update dependencies. The survey responses are collected through convenience sampling which may introduce possible, yet unknown biases in terms of experience, age, gender, development role, etc., so the generalization of our survey results to a broad developer audience should be cautious. The outcome of Dependabot usage may also not generalize to other dependency management bots due to their functionality and user base differences. In RQ1, we only base our analysis on JavaScript/npm projects which may not generalize to other ecosystems with different norms, policies, and practices [11]; the comparison of dependency management bot usage in different ecosystems could be an important avenue for future work. Despite these, we believe the implications we obtain for dependency management bot design should be general. Our proposed framework in Section 6.2 form a roadmap for dependency management bot designers. Our methodology could be applied in future studies to compare the effectiveness of different bots.\n\n7 Conclusion\n\nWe present an exploratory study on Dependabot version update service using repository mining and a survey, and we identify important limitations in the design of Dependabot. From our findings, we derive a four-dimension framework in the hope that it can help dependency management bot design and inspire more research work on related fields.\n\nSeveral directions of future work arise from our study. For example, investigating and comparing other dependency management bots, especially Renovate Bot, can help verify the generalizability of our proposed framework. An empirical foundation on the factors affecting the effectiveness of bot adoption is also necessary. It will be interesting to investigate the recommendation of bot configurations to developers, or to study how different approaches (e.g., program analysis, machine learning, release note generation) can help developers assess the compatibility of bot PRs.\n\n8 Data Availability\n\nWe provide a replication package at Figshare:\n\nhttps://figshare.com/s/78a92332e4843d64b984\n\nThe package can be used to replicate the results from repository mining. To preserve the privacy of survey respondents, we choose not to disclose any raw data from the survey.\n\nAcknowledgments\n\nThis work is supported by the National Key R&D Program of China Grant 2018YFB1004201 and the National Natural Science Foundation of China Grant 61825201. We sincerely thank the developers who participated in our survey.\n\nReferences\n\n[1] T. Winters, T. Manshreck, and H. Wright, Software Engineering at Google: Lessons Learned from Programming over Time. O\u2019Reilly Media, 2020.\n[2] R. G. Kula, D. M. Germ\u00e1n, A. Ouni, T. Ishio, and K. Inoue, \u201cDo developers update their library dependencies? - an empirical study on the impact of security advisories on library migration,\u201d Empir. Softw. Eng., vol. 23, no. 1, pp. 384\u2013417, 2018.\n[3] https://github.com/dependabot.\n[4] https://github.com/renovatebot.\n[5] https://pyup.io/.\n[6] https://github.com/snyk-bot.\n[7] M. Wyrich, R. Ghit, T. Haller, and C. M\u00fcller, \u201cBots don\u2019t mind waiting, do they? comparing the interaction with automatically and manually created pull requests,\u201d in 3rd IEEE/ACM International Workshop on Bots in Software Engineering, BotSE@ICSE 2021, Madrid, Spain, June 4, 2021. IEEE, 2021, pp. 6\u201310.\n[8] S. Mirhosseini and C. Parnin, \u201cCan automated pull requests encourage software developers to upgrade out-of-date dependencies?\u201d in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, USA, October 30 - November 03, 2017. IEEE Computer Society, 2017, pp. 84\u201394.\n[9] https://greenkeeper.io/.\n[10] https://www.sonatype.com/resources/state-of-the-software-supply-chain-2021.\n[11] C. Bogart, C. K\u00e4stner, J. D. Herbsleb, and F. Thung, \u201cWhen and how to make breaking changes: Policies and practices in 18 open source software ecosystems,\u201d ACM Trans. Softw. Eng. Methodol., vol. 30, no. 4, pp. 42:1\u201342:56, 2021.\n[12] G. Bavota, G. Canfora, M. D. Penta, R. Oliveto, and S. Panichella, \u201cHow the apache community upgrades dependencies: an evolutionary study,\u201d Empir. Softw. Eng., vol. 20, no. 5, pp. 1275\u20131317, 2015.\n[13] I. Pashchenko, D. L. Vu, and F. Massacci, \u201cA qualitative study of dependency management and its security implications,\u201d in CCS \u201920: 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, November 9-13, 2020. ACM, 2020, pp. 1513\u20131531.\n[14] J. Cox, E. Bouwers, M. C. J. D. van Eekelen, and J. Visser, \u201cMeasuring dependency freshness in software systems,\u201d in 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 2. IEEE Computer Society, 2015, pp. 109\u2013118.\n[15] J. M. Gonz\u00e1lez-Barahona, P. Sherwood, G. Robles, and D. Izquierdo-Cortazar, \u201cTechnical lag in software compilations: Measuring how outdated a software deployment is,\u201d in Open Source Systems: Towards Robust Practices - 13th IFIP WG 2.13 International Conference, OSS 2017, Buenos Aires, Argentina, May 22-23, 2017, Proceedings, ser. IFIP Advances in Information and Communication Technology, vol. 496, 2017, pp. 182\u2013192.\n[16] A. Zerouali, T. Mens, and E. Constantinou, \u201cOn the evolution of technical lag in the npm package dependency network,\u201d in 2018 IEEE International Conference on Software Maintenance and Evolution, ICSME 2018, Madrid, Spain, September 23-29, 2018. IEEE Computer Society, 2018, pp. 404\u2013414.\n[17] A. Zerouali, E. Constantinou, T. Mens, G. Robles, and J. M. Gonz\u00e1lez-Barahona, \u201cAn empirical analysis of technical lag in npm package dependencies,\u201d in New Opportunities for Software Reuse - 17th International Conference, ICSR 2018, Madrid, Spain, May 21-23, 2018, Proceedings, ser. Lecture Notes in Computer Science, vol. 10826. Springer, 2018, pp. 95\u2013110.\n[18] A. Zerouali, T. Mens, J. M. Gonz\u00e1lez-Barahona, A. Decan, E. Constantinou, and G. Robles, \u201cA formal framework for measuring technical lag in component repositories - and its application to npm,\u201d J. Softw. Evol. Process., vol. 31, no. 8, 2019.\n[19] J. Stringer, A. Tahir, K. Blincoe, and J. Dietrich, \u201cTechnical lag of dependencies in major package managers,\u201d in 27th Asia-Pacific Software Engineering Conference, APSEC 2020, Singapore, December 1-4, 2020. IEEE, 2020, pp. 228\u2013237.\n[20] A. Zerouali, T. Mens, A. Decan, J. M. Gonz\u00e1lez-Barahona, and G. Robles, \u201cA multi-dimensional analysis of technical lag in Debian-based docker images,\u201d Empir. Softw. Eng., vol. 26, no. 2, p. 19, 2021.\n[21] K. Chow and D. Notkin, \u201cSemi-automatic update of applications in response to library changes,\u201d in 1996 International Conference\non Software Maintenance (ICSM \u201996), 4-8 November 1996, Monterey, CA, USA, Proceedings. IEEE Computer Society, 1996, p. 359.\n\n[22] J. Henkel and A. Diwan, \u201cCatchUp: capturing and replaying refactorings to support API evolution,\u201d in 27th International Conference on Software Engineering (ICSE 2005), 15-21 May 2005, St. Louis, Missouri, USA. ACM, 2005, pp. 274\u2013283.\n\n[23] Z. Xing and E. Stroulia, \u201cAPI-evolution support with Diff-CatchUp,\u201d IEEE Trans. Software Eng., vol. 33, no. 12, pp. 818\u2013836, 2007.\n\n[24] H. A. Nguyen, T. T. Nguyen, G. W. Jr., A. T. Nguyen, M. Kim, and T. N. Nguyen, \u201cA graph-based approach to API usage adaptation,\u201d in Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2010, October 17-21, 2010, Reno/Tahoe, Nevada, USA. ACM, 2010, pp. 302\u2013321.\n\n[25] B. Dagenais and M. P. Robillard, \u201cRecommending adaptive changes for framework evolution,\u201d ACM Trans. Softw. Eng. Methodol., vol. 20, no. 4, pp. 19:1\u201319:35, 2011.\n\n[26] B. Cossette and R. J. Walker, \u201cSeeking the ground truth: a retrospective study on the evolution and migration of software libraries,\u201d in 20th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-20), SIGSOFT/FSE\u201912, Cary, NC, USA - November 11 - 16, 2012. ACM, 2012, p. 55.\n\n[27] K. Huang, B. Chen, L. Pan, S. Wu, and X. Peng, \u201cREPFINDER: finding replacements for missing APIs in library update,\u201d in 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021. IEEE, 2021, pp. 266\u2013278.\n\n[28] B. B. Nielsen, M. T. Torp, and A. M\u00f8ller, \u201cSemantic patches for adaptation of JavaScript programs to evolving libraries,\u201d in 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, 2021, pp. 74\u201385.\n\n[29] S. A. Haryono, F. Thung, D. Lo, J. Lawall, and L. Jiang, \u201cML-CatchUp: Automated update of deprecated machine-learning APIs in Python,\u201d in IEEE International Conference on Software Maintenance and Evolution, ICSEM 2021, Luxembourg, September 27 - October 1, 2021. IEEE, 2021, pp. 584\u2013588.\n\n[30] S. A. Haryono, F. Thung, D. Lo, L. Jiang, J. Lawall, H. J. Kang, L. Semino, and C. M\u00fcller, \u201cAndroEvolve: Automated Android API update with data flow analysis and variable denormalization,\u201d Empir. Softw. Eng., vol. 27, no. 3, p. 73, 2022.\n\n[31] https://semver.org/.\n\n[32] S. Mostafa, R. Rodriguez, and X. Wang, \u201cExperience paper: a study on behavioral backward incompatibilities of Java software libraries,\u201d in Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, Santa Barbara, CA, USA, July 10 - 14, 2017. ACM, 2017, pp. 215\u2013225.\n\n[33] S. Raemaekers, A. van Deursen, and J. Visser, \u201cSemantic versioning and impact of breaking changes in the Maven repository,\u201d J. Syst. Softw., vol. 129, pp. 140\u2013158, 2017.\n\n[34] J. Hejderup and G. Gousios, \u201cCan we trust tests to automate dependency updates? A case study of Java projects,\u201d J. Syst. Softw., vol. 183, p. 111097, 2022.\n\n[35] J. Wu, H. He, W. Xiao, K. Gao, and M. Zhou, \u201cDemystifying software release note issues on GitHub,\u201d in Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, ICPC 2022, Pittsburgh, USA, May 16-17, 2022. ACM, 2022.\n\n[36] P. Lam, J. Dietrich, and D. J. Pearce, \u201cPutting the semantics into semantic versioning,\u201d in Proceedings of the 2020 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Onward! 2020, Virtual, November, 2020. ACM, 2020, pp. 157\u2013179.\n\n[37] B. Rombaut, F. R. Cogo, B. Adams, and A. E. Hassan, \u201cThere\u2019s no such thing as a free lunch: Lessons learned from exploring the overhead introduced by the Greenkeeper dependency bot in npm,\u201d ACM Transactions on Software Engineering and Methodology, 2022.\n\n[38] M. S. Wessel, B. M. de Souza, I. Steinmacher, I. S. Wiese, I. Polato, A. P. Chaves, and M. A. Gerosa, \u201cThe power of bots: Characterizing and understanding bots in OSS projects,\u201d Proc. ACM Hum. Comput. Interact., vol. 2, no. CSCW, pp. 182:1\u2013182:19, 2018.\n\n[39] L. Erlenhov, F. G. de Oliveira Neto, and P. Leitner, \u201cAn empirical study of bots in software development: characteristics and challenges from a practitioner\u2019s perspective,\u201d in ESEC/FSE \u201920: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020. ACM, 2020, pp. 445\u2013455.\n\n[40] M. S. Wessel, I. Wiese, I. Steinmacher, and M. A. Gerosa, \u201cDon\u2019t disturb me: Challenges of interacting with software bots on open source software projects,\u201d Proc. ACM Hum. Comput. Interact., vol. 5, no. CSCW2, pp. 1\u201321, 2021.\n\n[41] M. S. Wessel, A. Abdellatif, I. Wiese, T. Conte, E. Shihab, M. A. Gerosa, and I. Steinmacher, \u201cBots for pull requests: The good, the bad, and the promising,\u201d in 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. IEEE, 2022, pp. 274\u2013286.\n\n[42] E. Shihab, S. Wagner, M. A. Gerosa, M. Wessel, and J. Cabot, \u201cThe present and future of bots in software engineering,\u201d IEEE Software, 2022.\n\n[43] S. Santhanam, T. Hecking, A. Schreiber, and S. Wagner, \u201cBots in software engineering: a systematic mapping study,\u201d PeerJ Comput. Sci., vol. 8, p. e866, 2022.\n\n[44] https://github.com/apps/dependabot-preview.\n\n[45] L. Erlenhov, F. G. de Oliveira Neto, and P. Leitner, \u201cDependency management bots in open-source systems - prevalence and adoption,\u201d PeerJ Comput. Sci., vol. 8, p. e849, 2022.\n\n[46] T. Dey, S. Mousavi, E. Ponce, T. Fry, B. Vasilescu, A. Filippova, and A. Mockus, \u201cDetecting and characterizing bots that commit code,\u201d in MSR \u201920: 17th International Conference on Mining Software Repositories, Seoul, Republic of Korea, 29-30 June, 2020. ACM, 2020, pp. 209\u2013219.\n\n[47] https://www.indiehackers.com/interview/living-off-our-savings-and-growing-our-saas-to-740-mo.\n\n[48] https://www.indiehackers.com/product/dependabot-acquired-by-github-1g7T7DN1rGEZM204shF.\n\n[49] https://github.com/baker/dependabot-preview/.\n\n[50] https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated/configuration-options-for-dependency-updates.\n\n[51] https://docs.github.com/en/code-security/supply-chain-security/managing-vulnerabilities-in-your-projects-dependencies/about-alerts-for-vulnerable-dependencies#access-to-dependabot-alerts.\n\n[52] Pull Request #1127 of datadesk/baker.\n\n[53] https://docs.github.com/en/code-security/supply-chain-security/managing-vulnerabilities-in-your-projects-dependencies/about-dependabot-security-updates.\n\n[54] M. Alfadel, D. E. Costa, E. Shihab, and M. Mkhallalati, \u201cOn the use of Dependabot security pull requests,\u201d in 18th IEEE/ACM International Conference on Mining Software Repositories, MSR 2021, Madrid, Spain, May 17-19, 2021. IEEE, 2021, pp. 254\u2013265.\n\n[55] C. Soto-Valero, T. Durieux, and B. Baudry, \u201cA longitudinal analysis of bloated Java dependencies,\u201d in ESEC/FSE \u201921: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021. ACM, 2021, pp. 1021\u20131031.\n\n[56] F. R. Cogo and A. E. Hassan, \u201cUnderstanding the customization of dependency bots: The case of dependabot,\u201d IEEE Software, 2022.\n\n[57] Pull Request #4317 of caddyserver/caddy.\n\n[58] G. Gousios, \u201cThe GHTorrent dataset and tool suite,\u201d in Proceedings of the 10th Working Conference on Mining Software Repositories, ser. MSR \u201913. Piscataway, NJ, USA: IEEE Press, 2013, pp. 233\u2013236.\n\n[59] N. Munaiah, S. Kroh, C. Cabrey, and M. Naqappan, \u201cCurating GitHub for engineered software projects,\u201d Empir. Softw. Eng., vol. 22, no. 6, pp. 3219\u20133253, 2017.\n\n[60] H. He, R. He, H. Gu, and M. Zhou, \u201cA large-scale empirical study on Java library migrations: prevalence, trends, and rationales,\u201d in ESEC/FSE \u201921: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021. ACM, 2021, pp. 478\u2013490.\n\n[61] https://docs.github.com/en/rest.\n\n[62] https://workers.cloudflare.com/.\n\n[63] https://github.com/advisories.\n\n[64] M. Goeminnie and T. Mens, \u201cEvidence for the Pareto principle in open source software activity,\u201d in the Joint Proceedings of the 1st International workshop on Model Driven Software Maintenance and 5th International Workshop on Software Quality and Maintainability. Citeseer, 2011, pp. 74\u201382.\n\n[65] Y. Zhang, M. Zhou, A. Mockus, and Z. Jin, \u201cCompanies\u2019 participation in OSS development\u2014an empirical study of OpenStack,\u201d IEEE Trans. Software Eng., vol. 47, no. 10, pp. 2242\u20132259, 2021.\n[66] A. Decan, T. Mens, and P. Grosjean, \u201cAn empirical comparison of dependency network evolution in seven software packaging ecosystems,\u201d Empir. Softw. Eng., vol. 24, no. 1, pp. 381\u2013416, 2019.\n\n[67] https://github.com/dependabot/dependabot-core/issues/4146.\n\n[68] R. Likert, \u201cA technique for the measurement of attitudes.\u201d Archives of Psychology, 1932.\n\n[69] https://tools4dev.org/resources/how-to-choose-a-sample-size/.\n\n[70] X. Tan, M. Zhou, and Z. Sun, \u201cA first look at good first issues on GitHub,\u201d in ESEC/FSE \u201920: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020. ACM, 2020, pp. 398\u2013409.\n\n[71] https://www.qualtrics.com/blog/ethical-issues-for-online-surveys/.\n\n[72] Y. Zhao, A. Serebrenik, Y. Zhou, V. Filkov, and B. Vasilescu, \u201cThe impact of continuous integration on other software development practices: a large-scale empirical study,\u201d in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, USA, October 30 - November 03, 2017. IEEE Computer Society, 2017, pp. 60\u201371.\n\n[73] N. Cassee, B. Vasilescu, and A. Serebrenik, \u201cThe silent helper: The impact of continuous integration on code reviews,\u201d in 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2020, London, ON, Canada, February 18-21, 2020. IEEE, 2020, pp. 423\u2013434.\n\n[74] J. Romano, J. D. Kromrey, J. Coraggio, and J. Skowronek, \u201cAppropriate statistics for ordinal level data: Should we really be using t-test and Cohen\u2019s d for evaluating group differences on the nse and other surveys,\u201d in Annual Meeting of the Florida Association of Institutional Research, vol. 177, 2006, p. 34.\n\n[75] S. Prion and K. Haerling, \u201cMaking sense of methods and measurement: Spearman-rho ranked-order correlation coefficient,\u201d Clinical Simulation in Nursing, vol. 10, p. 535\u2013536, 10 2014.\n\n[76] P. Sturgis, C. Roberts, and P. Smith, \u201cMiddle alternatives revisited: How the neither/nor response acts as a way of saying \u201ci don\u2019t know\u201d?\u201d Sociological Methods & Research, vol. 43, no. 1, pp. 15\u201338, 2014.\n\n[77] Pull Request #259 of dropbox/stone.\n\n[78] Pull Request #3155 of tuist/tuist.\n\n[79] Pull Request #663 of ros-tooling/action-ros-ci.\n\n[80] Commit #b337b5f of justeat/httpclient-interception.\n\n[81] Pull Request #1260 of asynkron/protoactor-dotnet.\n\n[82] Commit #a06b04e of Azure/bicep.\n\n[83] S. H. Khandkar, \u201cOpen coding,\u201d University of Calgary, vol. 23, p. 2009, 2009.\n\n[84] R. J. Passonneau, \u201cMeasuring agreement on set-valued items (MASI) for semantic and pragmatic annotation,\u201d in Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy, May 22-28, 2006. European Language Resources Association (ELRA), 2006, pp. 831\u2013836.\n\n[85] K. Krippendorff, Content Analysis: An Introduction to its Methodology. Sage publications, 2018.\n\n[86] Pull Request #134 of skytable/skytable.\n\n[87] Comment from Issue #1190 of dependabot/dependabot-core.\n\n[88] Issue #1296 of dependabot/dependabot-core.\n\n[89] Pull Request #2635 of giantswarm/happa.\n\n[90] Commit #8cecf22 of Fate-Grand-Automata/FGA.\n\n[91] Pull Request #1976 of stoplightio/spectral.\n\n[92] Issue #1736 of dependabot/dependabot-core.\n\n[93] Issue #1297 of dependabot/dependabot-core.\n\n[94] Issue #202 of nitzano/gatsby-source-hashnode.\n\n[95] Issue #26 of replygirl/tc.\n\n[96] Pull Request #1987 of stoplightio/spectral.\n\n[97] Pull Request #2916 of codalab/codalab-worksheets.\n\n[98] Pull Request #126 of lyft/clutch.\n\n[99] Pull Request #3622 of video-dev/hls.js.\n\n[100] Comment from Issue #1973 of dependabot/dependabot-core.\n\n[101] Issue #60 of ahmadnassri/action-dependabot-auto-merge.\n\n[102] https://github.blog/changelog/label/dependabot/.\n\n[103] https://docs.renovatebot.com/.\n\n[104] https://docs.renovatebot.com/merge-confidence/.\n\n[105] M. Zimmermann, C. Staicu, C. Tenny, and M. Pradel, \u201cSmall world with high risks: A study of security threats in the npm ecosystem,\u201d in 28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, August 14-16, 2019. USENIX Association, 2019, pp. 995\u20131010.\n\n[106] A. Godulla, M. Bauer, J. Dietlmeier, A. L\u00fcck, M. Matzen, and F. Vaa\u00dfen, \u201cGood bot vs. bad bot: Opportunities and consequences of using automated software in corporate communications,\u201d 2021.\n\n[107] Z. Peng and X. Ma, \u201cExploring how software developers work with mention bot in github,\u201d CCF Trans. Pervasive Comput. Interact., vol. 1, no. 3, pp. 190\u2013203, 2019.\n\n[108] D. Foo, H. Chua, J. Yeo, M. Y. Ang, and A. Sharma, \u201cEfficient static checking of library updates,\u201d in Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018. ACM, 2018, pp. 791\u2013796.\n\n[109] L. F. Cortes-Coy, M. L. V\u00e1squez, J. Aponte, and D. Poshyvanyk, \u201cOn automatically generating commit messages via summarization of source code changes,\u201d in 14th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2014, Victoria, BC, Canada, September 28-29, 2014. IEEE Computer Society, 2014, pp. 275\u2013284.\n\n[110] L. Moreno, G. Bavota, M. D. Penta, R. Oliveto, A. Marcus, and G. Canfora, \u201cARENA: an approach for the automated generation of release notes,\u201d IEEE Trans. Software Eng., vol. 43, no. 2, pp. 106\u2013127, 2017.\n\n[111] D. Poole, A. Mackworth, and R. Goebel, Computational Intelligence: A Modern Approach. Pearson Education, Inc., 2010.\n\nRunzhi He is currently an undergraduate student at the School of Electronics Engineering and Computer Science (EECS), Peking University. His research mainly focuses on open source sustainability and software supply chain. He can be contacted via rzhe@pku.edu.cn\n\nHao He is currently a Ph.D. student at the School of Computer Science, Peking University. Before that, he received his B.S. degree in Computer Science from Peking University in 2020. His research addresses socio-technical sustainability problems in open source software communities, ecosystems, and supply chains. More information can be found on his personal website https://hehao98.github.io/ and he can be reached at hehao98@pku.edu.cn.\n\nYuxia Zhang is currently an assistant professor at the School of Computer Science and Technology, Beijing Institute of Technology (BIT). She received her Ph.D. in 2020 from the School of Electronics Engineering and Computer Science (EECS), Peking University. Her research interests include mining software repositories and open-source software ecosystems, mainly focusing on commercial participation in open-source. She can be contacted at yuxiazh@bit.edu.cn.\nMinghui Zhou received the BS, MS, and Ph.D. degrees in computer science from the National University of Defense Technology in 1995, 1999, and 2002, respectively. She is a professor in the School of Computer Science at Peking University. She is interested in software digital sociology, i.e., understanding the relationships among people, project culture, and software products through mining the repositories of software projects. She is a member of the ACM and IEEE. She can be reached at zhmh@pku.edu.cn.", "source": "olmocr", "added": "2025-06-23", "created": "2025-06-23", "metadata": {"Source-File": "/home/nws8519/git/adaptation-slr/studies_pdfs/021_he.pdf", "olmocr-version": "0.1.76", "pdf-total-pages": 19, "total-input-tokens": 66338, "total-output-tokens": 29316, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 4621, 1], [4621, 11289, 2], [11289, 17379, 3], [17379, 23033, 4], [23033, 27053, 5], [27053, 33721, 6], [33721, 39746, 7], [39746, 46514, 8], [46514, 51571, 9], [51571, 56575, 10], [56575, 62937, 11], [62937, 69201, 12], [69201, 76077, 13], [76077, 82903, 14], [82903, 89467, 15], [89467, 96549, 16], [96549, 105238, 17], [105238, 111971, 18], [111971, 112477, 19]]}}
|