1
0
adaptation-slr/studies_pdfs/021_he.md

110 KiB
Raw Blame History

Automating Dependency Updates in Practice: An Exploratory Study on GitHub Dependabot

Runzhi He, Hao He, Yuxia Zhang, Minghui Zhou

Abstract—Dependency management bots automatically open pull requests to update software dependencies on behalf of developers. Early research shows that developers are suspicious of updates performed by dependency management bots and feel tired of overwhelming notifications from these bots. Despite this, dependency management bots are becoming increasingly popular. Such contrast motivates us to investigate Dependabot, currently the most visible bot on GitHub, to reveal the effectiveness and limitations of state-of-art dependency management bots. We use exploratory data analysis and a developer survey to evaluate the effectiveness of Dependabot in keeping dependencies up-to-date, interacting with developers, reducing update suspicion, and reducing notification fatigue. We obtain mixed findings. On the positive side, projects do reduce technical lag after Dependabot adoption and developers are highly receptive to its pull requests. On the negative side, its compatibility scores are too scarce to be effective in reducing update suspicion; developers tend to configure Dependabot toward reducing the number of notifications; and 11.3% of projects have deprecated Dependabot in favor of other alternatives. The survey confirms our findings and provides insights into the key missing features of Dependabot. Based on our findings, we derive and summarize the key characteristics of an ideal dependency management bot which can be grouped into four dimensions: configurability, autonomy, transparency, and self-adaptability.

Index Terms—Dependency Management, Software Engineering Bot, Dependabot, Mining Software Repositories

1 INTRODUCTION

To update or not to update, that is the question haunting software engineers for decades. The software engineering “gurus” would argue that keeping software dependencies up-to-date minimizes technical debt, increases supply chain security, and ensures software project sustainability in the long term [1]. Nonetheless, it requires not only substantial effort but also extra responsibility from developers. Consequently, many developers adhere to the practice of “if it aint broke, dont fix it” and the majority of existing software systems use outdated dependencies [2].

One promising solution for this dilemma is to use bots to automate all dependency updates. Therefore, dependency management bots are invented to automatically open pull requests (PRs) to update dependencies in a collaborative coding platform (e.g., GitHub) in the hope of saving developer effort. Recently, dependency management bots are increasingly visible and gaining high momentum among practitioners. The exemplars of these bots, including Dependabot [3], Renovate Bot [4], PyUp [5], and Synk Bot [6], have opened millions of PRs on GitHub [7] and are adopted by a variety of industry teams (according to their websites).

However, the simple idea of using a bot does not save the world. The early work of Mirhosseini and Parnin [8] on Greenkeeper [9] reveals that: only 32% of Greenkeeper PRs are merged because developers are suspicious of whether a bot PR will break their code (i.e., update suspicion) and feel annoyed about a large number of bot PRs (i.e., notification fatigue). Since then, similar bots have emerged, evolved, and gained high popularity, among them the most visible one on GitHub is Dependabot [7] with many improvements (Section 2.3). However, it remains unknown to what extent can these bots overcome the two limitations of Greenkeeper identified by Mirhosseini and Parnin [8] in 2017.

To shed light on improving dependency management bots and software engineering bots in general, we present an exploratory study on Dependabot. Our study answers the following four research questions (RQs) to empirically evaluate the effectiveness of Dependabot version update in different dimensions (detailed motivations in Section 3):

  • RQ1: To what extent does Dependabot reduce the technical lag of a project after its adoption?
  • RQ2: How actively do developers respond to and merge pull requests opened by Dependabot?
  • RQ3: How effective is Dependabots compatibility score in allaying developers update suspicion?
  • RQ4: How do projects configure Dependabot for automating dependency updates?

As we find that many projects have deprecated Dependabot in favor of other alternatives, we ask an additional RQ:

  • RQ5: How do projects deprecate Dependabot and what are the developers desired features for Dependabot? To answer the RQs, we sample 1,823 popular and actively maintained GitHub projects as the study subjects. We conduct exploratory data analysis on 502,752 Dependabot PRs from these projects and use a survey of 131 developers to triangulate our findings. Our findings provide empirical characterizations of Dependabots effectiveness in various dimensions. More importantly, we discover important limitations of Dependabot (a state-of-the-art bot) in overcoming update suspicion and notification fatigue, along with the missing features for overcoming the limitations. Based on the findings, we summarize four key properties of an ideal dependency management bot (i.e., configurability, autonomy, transparency, and self-adaptability) as a roadmap for software engineering researchers and bot designers.

2 Background and Related Work

2.1 Dependency Update

In modern software development, updating dependencies is not only important but also non-trivial. A typical software project may have tens to thousands of dependencies and each of the outdated ones induces risks [10]. However, each update may contain breaking changes which can be hard to discover and fix [11]. This situation inspires research into understanding update practices, designing metrics, and inventing approaches to support dependency updates.

Bavota et al. [12] find that updates in the Apache ecosystems are triggered by major changes or a large number of bug fixes, but may be prevented by API removals. Kula et al. [2] discover that 81.5% of the 4600 studied Java/Maven projects on GitHub still keep outdated dependencies due to lack of awareness and extra workload. Pashchenko et al. [13] find through semi-structured interviews that developers face trade-offs when updating dependencies (e.g., vulnerabilities, breaking changes, policies).

Researchers have proposed measurements to quantify the “freshness” or “outdatedness” of software dependencies and applied them to various software ecosystems. Cox et al. [14] propose several metrics to quantify “dependency freshness” and evaluate them on a dataset of industrial Java systems. A series of studies [15], [16], [17], [18], [19], [20] introduce the notion of technical lag, a metric for measuring the extent of project dependencies lagging behind their latest releases, and investigate the evolution of technical lag in Debian [15], npm [16], [17], [18], the Libraries.io dataset [19], and Docker images [20]. They find that technical lag tends to increase over time, induces security risks, and can be mitigated using semantic versioning.

There has been a long line of research in software engineering for supporting the automated update of software. Since API breaking changes form the majority of update cost, most studies propose automated approaches to match and adapt evolving APIs (e.g., [21], [22], [23], [24], [25]). However, Cossette and Walker [26] reveal through manual analysis that real API adaptation tasks are complex and beyond the capability of previous automated approaches. Recently, research interest in automated API adaptation is surging again with works on Java [27], JavaScript [28], Python [29], Android [30], etc.

On the other hand, practitioners often take conservative update approaches: upstream developers typically use semantic versioning [31] for signaling version compatibility; downstream developers perform most updates manually and detect incompatibilities through release notes, compilation failures, and regression testing. Unfortunately, studies [32], [33], [34], [35] reveal that none of them work well in guaranteeing update compatibility. Generally, providing such guarantees is still a challenging open problem [36].

2.2 Dependency Management Bots

Perhaps the most noticeable automation effort among practitioners is dependency management bots. These bots automatically create pull requests (PRs) to update dependencies either immediately after a new release is available or when a security vulnerability is discovered in the currently used version. In other words, dependency management bots solve the lack of awareness problem [2] by automatically pushing update notifications to developers.

Mirhosseini and Parnin [8] conduct a pioneering study on Greenkeeper and find that developers update dependencies 1.6x more frequently with Greenkeeper, but only 32% of Greenkeeper PRs are merged due to two major limitations:

  • Update Suspicion: If an automated update PR breaks their code, developers immediately become suspicious of subsequent PRs and are reluctant to merge them.
  • Notification Fatigue: If too many automated update PRs are generated, developers may feel annoyed about the notifications and simply ignore all the update PRs.

Rombaut et al. [37] find that Greenkeeper issues for in-range breaking updates induce a large maintenance overhead, and many of them are false alarms caused by project CI issues.

The limitations of Greenkeeper align well with the challenges revealed in the software engineering (SE) bot literature. Wessel et al. [38] find that SE bots on GitHub have interaction problems and provide poor decision-making support. Erlenhov et al. [39] identify two major challenges in “Alex” bot (i.e., SE bots that autonomously perform simple tasks) design: establishing trust and reducing interruption/noise. Wyrich et al. [7] find that bot PRs have a lower merge rate and need more time to be interacted with and merged. Two subsequent studies by Wessel et al. [40], [41] qualitatively show that noise is the central challenge in SE bot design but it can be mitigated by certain design strategies and the use of a “meta-bot.” Shihab et al. [42] draw a picture of SE bot technical and socio-economic challenges. Santhanam et al. [43] provide a systematic mapping of the SE bot literature.

Since Mirhosseini and Parnin [8], many other bots have emerged for automating dependency updates, such as Dependabot [3] (preview release in May 2017) and Renovate Bot [4] (first release in January 2017). Greenkeeper itself reaches end-of-life in June 2020 and its team merged with Synk Bot [6]. All these bots are widely used: according to Wyrich et al. [7], they opened the vast majority of bot PRs on GitHub (six out of the top seven). The top two are occupied by Dependabot [3] and Dependabot Preview [44] with 3 million PRs and 1.2 million PRs, respectively. Erlenhov et al. [45] find that under a strict SE bot definition, almost all bots in an existing bot commit dataset [46] are dependency management bots and they are frequently adopted, discarded, switched, and even simultaneously used by GitHub projects, indicating a fierce competition among them. 2.3 Dependabot

Among different dependency management bots, Dependabot [3] is the most visible one in GitHub projects [7]. Dependabot Preview was launched in 2017 [47] and acquired by GitHub in 2019 [48]. In August 2021, it was shut down in favor of the new, GitHub native Dependabot [49] operating since June 2020, which offers two main services:

  • Dependabot version update [50]: If a configuration file named dependabot.yml is added to a GitHub repository, Dependabot will begin to open PRs that update project dependencies to the latest version. Developers can specify the exact Dependabot behavior in dependabot.yml (e.g., update interval and the max number of PRs).

  • Dependabot security update [51]: Dependabot scans the entire GitHub to find repositories with vulnerable dependencies. Even if no dependabot.yml is supplied, Dependabot still alerts repository owners and repository owners can tell Dependabot to open PRs that update vulnerable dependencies to their patched versions.

Figure 1 shows an example Dependabot PR [52]. Apart from all other details, one especially interesting Dependabot feature is the compatibility score badge. According to GitHub documentation [53]: An updates compatibility score is the percentage of CI runs that passed when updating between specific versions of the dependency. In other words, the score uses the large-scale regression testing data available in GitHub CI test results to estimate the risk of breaking changes in a dependency update. This looks like a promising direction for solving the update suspicion problem, as previous studies have shown that project test suites are often unreliable in detecting update incompatibilities [34] and the false alarms introduce significant maintenance overhead [37]. However, the scores effectiveness in practice remains unknown.

For the notification fatigue problem, Wessel et al. [40] suggest SE bots offer flexible configurations and send only relevant notifications. Both solutions have been (principally) implemented by Dependabot, but it is still unclear whether the specific configuration options and notification strategies taken by Dependabot are really effective in practice. Alfadel et al. [54] find that developers receive Dependabot security PRs well: 65.42% of PRs are merged and most are merged within a day. However, security PRs only constitute a small portion of Dependabot PRs (6.9% in our dataset), and developers perceive security updates as highly relevant [13]. The effectiveness of Dependabot version update in general seems to be more problematic. Soto-Valero et al. [55] find that Dependabot opens many PRs on bloated dependencies. Cogo and Hassan [56] provides evidence on how the configuration of Dependabot causes issues for developers. As stated by two developers in a GitHub issue [57]: 1) I think wed rather manage dependency upgrades ourselves, on our own time. Weve been frequently bitten by dependency upgrades causing breakages. We tend to only upgrade dependencies when were close to being ready to cut a release. 2) Also Dependabot tends to be pretty spammy, which is rather annoying.

To the best of our knowledge, a comprehensive empirical investigation into the adoption of the Dependabot version update service is still lacking. Such knowledge from Dependabot can help the formulation of general design guidelines for dependency management bots and unveil important open challenges for fulfilling these guidelines.

3 RESEARCH QUESTIONS

Our study goal is to evaluate the practical effectiveness of the Dependabot version update service. In this Section, we elaborate on the motivation of each RQ toward this goal.

The Dependabot version update service is designed to make developers aware of new versions and help them keep project dependencies up-to-date. To quantitatively evaluate the extent to which Dependabot fulfills its main design purpose (i.e., keeping dependencies up-to-date), we reuse metrics from the technical lag literature [16], [18] and ask:

RQ1: To what extent does Dependabot reduce the technical lag of a project after its adoption?

To help developers keep dependencies up-to-date, Dependabot intervenes by automatically creating update PRs when new versions become available, after which developers can interact with (e.g., comment, merge) these PRs. We evaluate the effectiveness of this interaction process by measuring the extent to which developers interact smoothly with Dependabot PRs, forming the next RQ:

RQ2: How actively do developers respond to and merge pull requests opened by Dependabot?

One major limitation of Greenkeeper is that developers tend to be suspicious of whether a dependency update will introduce break changes [8] (i.e., update suspicion). On the other hand, Dependabot helps developers establish confidence on update PRs using the compatibility score feature (Section 2.3). To quantitatively evaluate the effectiveness of this feature against update suspicion, we ask:

RQ3: How effective is Dependabots compatibility score in allaying developers update suspicion?

The other major limitation of Greenkeeper is that developers tend to be overwhelmed by a large number of update PRs [8] (i.e., notification fatigue). On the other hand, Dependabot provides flexible configuration options for controlling the amount notifications (Section 2.3). To explore how developers configure (and re-configure) the number of notifications generated by Dependabot, we study real-world Dependabot configurations and ask:

RQ4: How do projects configure Dependabot for automating dependency updates?

During our analysis, we discover that a non-negligible portion of projects in our studied corpus have deprecated Dependabot and migrated to other alternatives. As an in-depth retrospective analysis of the reasons behind these deprecations can help reveal important Dependabot limitations and future improvement directions, we ask:

RQ5: How do projects deprecate Dependabot and what are the developers desired features for Dependabot? 4 STUDY DESIGN

An overview of our study is shown in Figure 2. The study follows a mix-method study design where we obtain results from repository data analysis and triangulate them with a developer survey. In this Section, we introduce the data collection and survey methods. The specific analysis methods will be presented along with their results in Section 5.

4.1 Data Collection

Project Selection. As the first step, we need to collect a sample of engineered and maintained GitHub projects using or once used Dependabot version update in their workflow. We focus on the GitHub native Dependabot (released on June 1, 2020) and do not include Dependabot Preview in our study because the former provides much richer features and allows us to obtain the latest, state-of-the-art results.

We begin with the latest dump of GHTorrent [58] (released on March 6, 2021), a large-scale dataset of GitHub projects widely used in software engineering research (e.g., [7], [34]). We find a noticeable gap in the GHTorrent dataset from July 2019 to early January 2020 (also observed by Wyrich et al. [7]). Focusing solely on GitHub native Dependabot allows us to circumvent threats caused by this gap because all its PRs are created after January 2020.

We select projects with at least 10 merged Dependabot PRs to keep only projects that have used Dependabot to some degree. To filter out irrelevant, low-quality, or unpopular projects, we retain only non-fork projects with at least 10 stars, as inspired by previous works [55], [59], [60]. Since projects without sustained activities may not perform dependency updates on a regular basis and induce noise in technical lag analysis (RQ1), we query GitHub APIs [61] and retain projects with a median weekly commit of at least one in the past year. To exclude projects that have never utilized Dependabot version update, we clone and retain only projects with some git change history on dependabot.yml. After all the filtering steps, we end up with 1,823 projects.

PR Collection. We use GitHub REST API [61] and a web scraper to find all Dependabot PRs (before February 14, 2022) in the projects and collect PR statistics, CI test results, and timeline events. By leveraging a distributed pool of Cloudflare workers [62], this web scraper empowers us to bypass the limitation of GitHub APIs (which is unhandy for collecting CI test results for PRs) and retrieve PR events and CI test results at scale. The PR body can tell which dependency this PR is updating, its current version, and its updated version. By the end of this stage, we obtain 540,665 Dependabot PRs (71.1% with a CI test result), updating 15,590 dependencies between 167,841 version pairs.

Our next task is to identify security updates from all of the PRs created by Dependabot. However, Dependabot is no longer labeling security updates due to security reasons. Instead, Dependabot is showing a banner on the PR web page which is only visible to repository administrators by default [51]. Therefore, we choose to construct a mirror of the GitHub security advisory database [63] and identify security PRs ourselves by checking whether the PR updates a version with a vulnerability entry at the time of PR creation. More specifically, we identify a PR to be a security update PR if: 1) the dependency and its current version matches a vulnerability in the GitHub security advisory database; 2) the updated version is newer than the version that fixes this vulnerability (i.e., no vulnerability after update); 3) the PR is created after the vulnerability disclosure in CVE. Eventually, we identify 37,313 security update PRs (6.9%) from the 540,665 Dependabot PRs in total.

Dataset Overview. As illustrated in Table 1, projects in our dataset are mostly engineered, popular GitHub projects with a large code base, active maintenance, rich development history, and frequent Dependabot usage. We notice a long-tail distribution in the metrics concerning the size of the project, i.e., number of contributors, lines of code, and commit frequency, which is expected and common in most mining software repository (MSR) datasets [35], [64], [65].

Most (44.1%) projects in our dataset utilize the npm package ecosystem, followed by Maven (12.3%), PyPI (11.7%), and Go modules (7.8%). Among the Dependabot PRs, those that update npm packages constitute even a higher portion (64.9%), followed by PyPI (8.9%), Go modules (4.3%), Bundler (3.9%), and Maven (3.9%), as packages in the npm ecosystem generally evolve faster [66].

Dependabot has opened hundreds of PRs for most of the projects (mean = 304, median = 204), even up to thousands for some of them. This likely indicates a high workload for project maintainers. In terms of the most updated dependencies, it is not surprising that all

Statistics Mean Median Distribution
# of Stars 1423.92 66.00
# of Commits 2837.11 1040.50
# of Contributors 26.50 12.00
Lines of Code (thousands) 98.18 19.89
# of Commits per Week 10.07 4.00
Age at Adoption (days) 1018.18 714.00
# of Dependabot PRs 304.56 204.00
# of Dependabot Interactions 644.54 410.00
# of Commits 477.00 331.50
# of Followers 168.00 53.50
Years of Experience (GitHub) 10.37 10.68
TABLE 2: Survey Questions and Their Results (131 Responses in Total)
5-Point Likert-Scale Questions Distribution Avg.
(RQ1) Dependabot helps my project keep all dependencies up-to-date. 50% 4.44
(RQ2) Dependabot PRs do not require much work to review and merge. 25% 3.94
(RQ2) I respond to a Dependabot PR fast if it can be safely merged. 25% 4.42
(RQ2) I ignore the Dependabot PR or respond slower if it cannot be safely merged. 25% 3.78
(RQ2) I handle a Dependabot PR with higher priority if it updates a vulnerable dependency. 25% 4.19
(RQ2) It requires more work to review and merge a Dependabot PR if it updates a vulnerable dependency. 25% 2.49
(RQ2) Dependabot often opens more PRs than I can handle. 25% 2.73
(RQ3) Compatibility scores are often available in Dependabot PRs. 50% 2.95
(RQ3) If a compatibility score is available, it is effective in indicating whether the update will break my code. 50% 2.95
(RQ4) Dependabot can be configured to fit the needs of my project. 50% 3.54
(RQ4) I configure Dependabot to make it less noisy (i.e., only update certain dependencies, scan less frequently, etc.) 50% 3.27

Multiple Choice Questions

| (RQ5) Are your GitHub repositories still using Dependabot for automating version updates? | 50% | 0.89 | | (RQ5) If not, why? | 50% | (Results in § 5.5) |

Open-Ended Questions

| (RQ5) Regardless of current availability, what are the features you want most for a bot that updates dependencies? Do you have any further opinions or suggestions? | 50% | (Results in § 5.5) |

Where appropriate, we also use evidence from open-ended question responses to support the results in RQ1 - RQ4.

  1. The survey has been approved by the Ethics Committee of Key Laboratory of High Confidence Software Technology, Ministry of Education (Peking University) under Grant No. CS20220011.

For each candidate, we send personalized emails to them (with information about how they used Dependabot), to avoid being perceived as spam. We try our best to follow common survey ethics [71], e.g., clearly introducing the purpose of this survey, being transparent about what we will do to their responses, etc. To increase the chance of getting a response and to contribute back to the open-source community, we offer to donate $5 to an open-source project of the respondents choice if they opt in. Therefore, we believe we have done minimal harm to the open-source developers we have contacted, and the results we get about Dependabot far outweigh the harm. In fact, we get several highly welcoming responses from the survey participants, such as: 1) keep up the good work! 2) If you would like to consult more, just ping me on ...Cheers!

The bottom half of Table 1 summarizes the demographics of the 131 survey respondents, showing that they are highly experienced with both Dependabot (a median of 410 interactions) and open source development (five to 15 years of experience, hundreds of commits, and many followers).

5 METHODS AND RESULTS

5.1 RQ1: Technical Lag

5.1.1 Repository Analysis Methods

We evaluate the effectiveness of Dependabot version updates by comparing the project technical lag at two time points: the day of Dependabot adoption (T_0) and 90 days after adoption (i.e., T_0 + 90). We choose 90 days as the interval to avoid the influence of deprecations as more than 85% of them happen 90 days after adoption. Since technical lag naturally increases over time [16], [18], we include an additional time point for comparison: 90 days before adoption (i.e., T_0 - 90).

For a project p at time t \in \{T_0 - 90, T_0, T_0 + 90\}, we denote all its direct dependencies as \text{deps}(p, t) and define the technical lag of project p at time t as:

\text{techlag}(p, t) = \frac{\sum_{d \in \text{deps}(p, t)} \text{mean}(0, t_{\text{latest}}(d) - t_{\text{adopted}}(d))}{|\text{deps}(p, t)|}

Here t_{\text{latest}}(d) denotes the release time of ( d )'s latest version at time t and t_{\text{adopted}}(d) denotes the release time of ( d )'s adopted version. We use \max to guard against the occasional case of t_{\text{latest}}(d) < t_{\text{adopted}}(d) (e.g., developers may continue to release 0.9.x versions after the release of 1.0.0).

This technical lag definition is inspired by Zerouali et al. [18] but with several adjustments. First, we use only their time-based variant instead of their version-based variant because cross-project comparisons would not be intuitive using the latter. Second, we use the mean value of all dependencies instead of maximum or median as the overall technical lag, because we intend to measure the overall effectiveness of Dependabot for both keeping most dependencies up-to-date and eliminating the most outdated ones.

We exclude projects with an age of fewer than 90 days at Dependabot adoption and projects that deprecate Dependabot within 90 days. We also exclude projects that migrate from Dependabot Preview since they may introduce bias into results. Since the computation of technical lag based on dependency specification files and version numbers requires non-trivial implementation work for each package ecosystem, we limit our analysis on JavaScript/npm, the most popular ecosystem in our dataset. We further exclude projects with no eligible npm dependencies configured for Dependabot in T_0 - 90, T_0, or T_0 + 90. After all the filtering, we retain 613 projects for answering RQ1.

We adopt the Regression Discontinuity Design (RDD) framework to estimate the impact of adopting Dependabot on project technical lags. RDD uses the level of discontinuity before/after an intervention to measure its effect size while taking the influence of an overall background trend into consideration. Given that technical lag tends to be naturally increasing over time [16], [17], [18], RDD is a more appropriate statistic modeling approach for our case compared with hypothesis testing approaches (e.g., one-side Wilcoxon rank-sum tests). Following previous SE works that utilized RDD [72], [73], we use sharp RDD, i.e., segmented regression analysis of interrupted time series data. We treat project-level technical lag as a time series function, compute the technical lag for each project every 15 days from T_0 - 90 to T_0 + 90, use ordinary least square regression to fit the RDD model, and watch for the presence of discontinuity at Dependabot adoption, formalized as the following model:


y_i = \alpha + \beta \cdot \text{time}_i + \gamma \cdot \text{intervention}_i + \theta \cdot \text{time}_{\text{after intervention}}_i + \sigma_i

Here y_i denotes the output variable (i.e., technical lag for each project in our case); \text{time} stands for the number of days from T_0 - 90; \text{intervention} binarizes the presence of Dependabot (0 before adopting Dependabot, 1 after adoption); \text{time}_{\text{after intervention}} counts the number of days from T_0 (0 when T_0 - 90 \leq \text{time} < T_0).

5.1.2 Repository Analysis Results

We present technical lags and their delta between time points in Table 3. We plot diagrams in Figure 3 to reflect how different projects increase/decrease their technical lag from T_0 - 90 to T_0 + 90. The first surprising fact we notice is that the technical lag of approximately one-third (216/613) of projects is already decreasing between T_0 - 90 and T_0, even if technical lag tends to increase over time [16], [18]. This indicates that these projects are already taking a proactive dependency update strategy even before adopting Dependabot. On the other hand, for about half (303/613) of the projects, the technical lag increases prior to Dependabot adoption, and 94 projects keep the technical lag unchanged. For all projects, the mean and median technical lag at T_0 - 90 is 73.68 and 16.27 days, respectively; they decrease at T_0 to 48.99 and 13.96 days, respectively; at T_0 + 90, 159 (25.9%) of the 613 projects have already achieved a zero technical lag.

Between T_0 and T_0 + 90, projects lower their technical lag even further from a mean of 48.99 days and a median of 13.96 days to a mean of 25.38 days and a median of 3.62 days. Among the 303 projects with an increasing technical lag between T_0 - 90 and T_0, about two-thirds (220) of them see a decrease after adopting Dependabot; among the 216 projects with decreasing technical lag, nearly half (94) of them see a decrease. More than one-third (219, 35.7%) of projects achieve completely zero technical lag 90 days after Dependabot adoption. Although there are still some increases, the magnitude is much smaller (e.g., 75% quantile of only +1.75 days between T_0 and T_0 + 90 compared with 75% quantile of +14.37 days between T_0 - 90 and T_0).

Table 4 shows that the regression variable \text{intervention} has a statistically significant negative coefficient (\text{coef.} = -31.2137, p < 0.001), indicating the adoption of Dependabot might have reduced technical lag and kept dependencies up-to-date in the sampled 613 projects. A more straightforward look at this trend can be observed in Figure 4: at T_0, project-level technical lag has a noticeable decrease, and there is a discontinuity between the liner-fitted technical lag before/after adoption. \text{time} and \text{time}_{\text{after intervention}} have negative coefficients, echoing with our earlier findings:

Table 3: Technical Lag (days) for 613 npm Projects

Metric Mean Median Distribution
\text{techlag}(p, T_0 - 90) 73.68 16.27
\Delta \text{in Between} -24.96 0.00
\text{techlag}(p, T_0) 48.99 13.96
\Delta \text{in Between} -23.61 -0.61
\text{techlag}(p, T_0 + 90) 25.38 3.62

Fig. 3: Project-level technical lag changes (T_0 - 90, T_0, T_0 + 90).

Table 4: The Estimated Coefficients and Significance Levels for the RDD Model We Fit (Section 5.1.1).

Feature Coef. Std. Err. t p
Intercept* 66.5209 4.595 14.477 0.000
\text{intervention} -31.2137 5.694 -5.306 0.000
\text{time} -0.0743 0.079 -0.945 0.345
\text{time}_{\text{after intervention}} -0.1011 0.100 -1.008 0.314
  • ( p < 0.001 ) the technical lag of sampled projects is already on decrease before Dependabot adoption and the introduction of Dependabot adds up to this decreasing trend. However, both of the coefficients are not comparable to that of intervention and are not statistically significant (p > 0.3).

5.1.3 Triangulation from Survey

Most developers agree that Dependabot is helpful in keeping their project dependencies up-to-date: 55.8% responded with Strongly Agree and 35.7% with Agree (Table 2). As noted by one developer: Dependabot does a great job of keeping my repositories current. This is because Dependabot serves well as an automated notification mechanism that tells them about the presence of new versions and pushes them to update their dependencies. As mentioned by two developers: 1) Dependabot is a wonderful way for me to learn about major/minor updates to libraries. 2) Dependabot can be a bit noisy, but it makes me aware of my dependencies.

However, some of the developers do not favor using Dependabot for automating dependency updates but only use Dependabot as a way of notification. For example: 1) I just use it for notifications about updates, but do them manually and check if anything broke in the process. 2) I am just using Dependabot to tell me if there is something to update and then update all in a single shot with plain package managers.

This indicates that they do not trust the reliability of Dependabot for automating updates and they do not think the current design of Dependabot can help them reduce the manual workload of updates. As an example, one developer states that: Dependency management is currently much easier just utilizing yarn/npm. We use Dependabot merely because it has been recommended, but updating dependencies was faster when I solely used the command line.

One developer suggests that using Dependabot only for update notifications has become such a common use case that they would prefer a dedicated, less noisy tool solely designed for this purpose: It (Dependabot) becomes more like an update notification, i.e. Im leveraging only half of its capability. Could there be something designed solely for this purpose? Less invasive, more informative, and instead of creating a PR for every packages update, I would like to see a panel-style hub to collect all the information for me to get a better overview in one place.


Findings for RQ1:

90 days after adopting Dependabot, projects decrease their technical lag from an average of 48.99 days to an average of 25.38 days. 35.7% of projects achieve zero technical lag 90 days after adoption. The adoption of Dependabot is a statistically significant intervention as indicated by RDD. Developers agree on its effectiveness in notifying updates, but question its effectiveness in automating updates.

5.2 RQ2: Developers Response to Pull Requests

5.2.1 Repository Analysis Methods

Inspired by prior works [7], [54], we use the following metrics to measure the receptiveness (i.e., how active developers merge) and responsiveness (i.e., how active developers respond) of Dependabot PRs:

  • Merge Rate: The proportion of merged PRs.
  • Merge Lag: The time it takes for a PR to be merged.

5.2.2 Repository Analysis Results

Table 5 shows the PR statistics we obtain for each group. The high merge rates (>70%) indicate the projects are highly

Fig. 4: Project-level technical lag changes (from T_0 - 90 to T_0 + 90). Red and green lines represent the liner-fitted technical lag before/after Dependabot adoption respectively.

TABLE 5: PR Statistics in Different Groups. All lags are measured in days. \bar{x} represents the mean and \mu represents the median over all PRs in this group.

Statistics regular sec/conf sec/nconf
# of PRs 502,752 13,406 23,907
Merge Rate 70.13% 73.71% 76.01%
Merge Lag \bar{x}=1.76, \mu=0.18 \bar{x}=3.45, \mu=0.18 \bar{x}=8.15, \mu=0.76
Close Lag \bar{x}=8.63, \mu=3.00 \bar{x}=14.42, \mu=5.00 \bar{x}=26.83, \mu=5.71
Resp. Lag \bar{x}=2.27, \mu=0.17 \bar{x}=3.74, \mu=0.17 \bar{x}=8.59, \mu=0.51
  • Close Lag: The time it takes for a PR to be closed (i.e., not merged into the project code base).
  • Response Lag: The time it takes for a PR to have human interactions, including any observable action in the PRs timeline, e.g., adding a label or assigning a reviewer.

The merge rate is intended to measure receptiveness and the latter three are intended to measure responsiveness.

We assume that results may differ for PRs in different groups. We expect that 1) developers are both more receptive and more responsive to security updates due to their higher priority of eliminating security vulnerabilities; and 2) projects that use Dependabot version update (i.e., contain dependabot.yml) are more responsive to Dependabot PRs. To verify our expectations, we divide PRs into three groups:

  • regular: Dependabot PRs that update a package to its latest version when the old version does not contain any known security vulnerabilities.
  • sec/conf: Security PRs that update a package with vulnerabilities to its patched version and are opened when the project has a dependabot.yml file in its repository (i.e., using Dependabot version update).
  • sec/nconf: Security PRs opened when the project does not have a dependabot.yml file in its repository. These PRs are opened either before the adoption or after the deprecation of Dependabot version update.

We examine the significance of inter-group metric differences with unpaired Mann-Whitney tests and Cliffs delta (\delta). Following Romano et al. [74], we consider the effect size as negligible for |\delta| \in [0, 0.147], small for |\delta| \in [0.147, 0.33], medium for |\delta| \in [0.33, 0.474], and large otherwise. receptive to Dependabot PRs regardless of whether they are security-related. They are more receptive to security PRs: their merge rate is 74.53%, even higher than 65.42% reported on Dependabot Preview security updates [54]. This may be because projects welcome security updates even more, or just because the projects we selected are such.

Alfadel et al. [54] find that Dependabot security PRs take longer to close than to merge. Our data illustrate a similar story: regular Dependabot PRs take a median of 0.18 days (≈ four hours) to merge and a median of 3.00 days to close. The difference is statistically significant with a large effect size (p < 0.001, \delta = 0.91).

The response lag, however, does not differ much from the merge lag in all groups, which confirms the timeliness of developers response towards Dependabot PRs. We observe human activities in 360,126 (72.2%) Dependabot PRs, among which 280,276 (77.8%) take less than one day to respond. However, this also indicates an inconsistency between fast responses and slow closes. As a glance at what caused this inconsistency, we sample ten closed PRs with developers activities before closing and inspect their event history. We find 9 out of 10 PRs are closed by Dependabot itself, for the PR being obsolete due to the release of a newer version or a manual upgrade (similar to the observation by Alfadel et al. [54]). Activities are development-related (e.g., starting a discussion, assigning reviewers) in 5 PRs, while the rest are interactions with Dependabot (e.g., @dependabot rebase).

Surprisingly, security PRs require a longer time to merge (p < 0.001, \delta = 0.87), close (p < 0.001, \delta = 0.72), and respond (p < 0.001, \delta = 0.87) with large effect sizes, regardless of whether the project is using Dependabot version update. Though Dependabot version update users do process security updates quicker (at least merge lag and response lag are noticeably shorter), this difference is not significant with negligible or small effect sizes (\delta \leq 0.23).

5.2.3 Triangulation from Survey

In general, developers agree that Dependabot PRs do not require much work to review and merge (34.1% Strongly Agree, 40.3% Agree, 14.0% Neutral).

We find that they follow two different patterns of using Dependabot. One pattern is to rapidly merge the PR if the tests pass and manually perform the update by hand otherwise (65.2% Strongly Agree, 19.7% Agree, 9.1% Neutral). In the latter case, they will respond to the Dependabot PR slower, or let Dependabot automatically close the PR after the manual update (36.4% Strongly Agree, 26.5% Agree, 20.5% Neutral). For example: I almost never have to look at Dependabot PRs because I have tests, and 99.99% of PRs are merged automatically. Rarely (when dependency changes API for example) I have to manually add some fixes/updates... As mentioned in Section 5.1.3, another pattern is to use Dependabot PRs solely as a way of notification and always perform manual updates. Both cases contribute to the much larger close lag we observe in Dependabot PRs.

In terms of security updates, most developers do handle security PRs with a higher priority (56.7% Strongly Agree, 16.3% Agree, 14.0% Neutral), but they do not think security PRs require more work to review and merge (19.4% Totally Disagree, 36.4% Disagree, 26.4% Neutral). One possible explanation for the slower response, merge, and close of security PRs is that developers consider some security vulnerabilities as irrelevant to them: I want it (Dependabot) to ignore security vulnerabilities in development dependencies that dont actually get used in production.

Developers have a mixed opinion on whether Dependabot opens more PRs than they can handle (15.9% Strongly Agree, 15.2% Agree, 22.0% Neutral, 20.5% Disagree, 26.5% Totally Disagree). Whether the PR workload introduced by Dependabot is acceptable may depend on other factors (e.g., the number of dependencies and how fast packages evolve), as indicated by two respondents: 1) The performance of Dependabot or other similar bots could depend on the number of dependencies a project has. For smaller projects, with a handful of dependencies, Dependabot will be less noisy and usually safe as compared to large projects with a lot of dependencies. 2) The utility of something like Dependabot depends heavily on the stack and number of dependencies you have. JS is much more noisy than Ruby, for example, because Ruby moves more slowly.

Findings for RQ2:

70% of Dependabot PRs are merged with a median merge lag of four hours. Compared with regular PRs, developers are less responsive (more time to respond, close or merge) but more receptive (higher merge rate) to security PRs. Developers tend to rapidly merge PRs they consider “safe” and perform manual updates for the remaining PRs.

5.3 RQ3: Compatibility Score

5.3.1 Repository Analysis Methods

We explore the effectiveness of compatibility scores in two aspects: Availability, and Correlation with Merge Rate.

  1. Availability: We begin our analysis by understanding the data availability of compatibility scores, for they would not take effect if they are absent from most of the PRs. For this purpose, we obtain compatibility scores from badges in PR bodies, which point to URLs defined per dependency version pair. That is, Dependabot computes one compatibility score for each dependency version pair \langle d, v_1, v_2 \rangle and show the score to all PRs that update dependency d from v_1 to v_2. In case this computation fails, Dependabot generates an unknown compatibility score for \langle d, v_1, v_2 \rangle.

Since compatibility scores are computed in a data-driven manner, we wonder if the popularity of the updated dependencies affects their availability. As a quick evaluation, we sample 20 npm dependencies with more than one million downloads per week as representatives for popular dependencies. Next, we retrieve the release history of these dependencies by querying the npm registry API, retaining only releases that came available after January 1, 2020 (recall that all Dependabot PRs in our dataset are created after January 2020, Section 4). For the releases in each dependency, we get all possible dependency version pairs from a Cartesian product (1,629 in total) and query their compatibility scores from corresponding Dependabot URLs.

  1. Correlation with Merge Rate: In theory, if developers perceive compatibility scores as reliable, PRs with higher compatibility scores will be more likely to get merged. To quantitatively evaluate this, we compare merge rates for PRs with different compatibility scores. Since PRs that update the same version pair share the same score, we further Fig. 5: Distribution of compatibility scores and available CI test results over the version pairs of axios.

(a) Compatibility Score (b) # of CI Test Results

TABLE 6: Compatibility Score and PR Merge Rate

Compatibility Score # of PRs Merge Rate
unknown 485,501 69.96%
< 80% 1,321 30.20%
< 90%, >= 80% 1,605 67.48%
< 95%, >= 90% 1,794 73.19%
< 100%, >= 95% 2,228 84.43%
== 100% 10,303 80.30%

utilize Spearmans \rho to measure the correlation between a) compatibility score for a dependency version pair (d, v_1, v_2), and b) merge rate for all PRs that update d from v_1 to v_2.

As we will show in Section 5.3.2, compatibility scores are abnormally scarce. Although we have reached Dependabot maintainers for explanations, they claim such information to be confidential and refuse to share any details. We compute the number of CI test results for each dependency version pair and analyze their overall distribution to provide possible explanations for such scarcity.

5.3.2 Repository Analysis Results

  1. Availability: Compatibility scores are extremely scarce: Only 3.4% of the PRs and 0.5% of the dependency version pairs have a compatibility score other than unknown. Merely 0.18% of the dependency version pairs have a value other than 100%. Its scarcity does not become better even among the most popular npm dependencies: 1,604 (98.5%) of the 1,629 dependency version pairs we sample only have a compatibility score of unknown, 10 (0.6%) have a compatibility score of 100%, and 15 (0.9%) have a compatibility score less than 100%. As an example, we plot a compatibility score matrix for axios, which has the most (15) version pairs with compatibility scores, in Figure 5a.

  2. Correlation with Merge Rate: We summarize the merge rates for PRs with different compatibility scores in Table 6. We can observe that for PRs with a compatibility score, a high score indeed increases their chance of being merged: if the score is higher than 90%, developers are more likely to merge the PR. By contrast, if the score is lower than 80%, developers become very unlikely (30.20%) to merge. The Spearmans \rho between compatibility score and merge rate is 0.37 (p < 0.001), indicating a weak correlation according to Prion and Haerlings interpretation [75].

Figure 6 shows the number of dependency version pairs with more than x CI test results. We can observe an extreme Pareto-like distribution: for the 167,053 dependency version pairs in our dataset, less than 1,000 have more than 50 CI test results and less than 100 have more than 150 CI test results. For the case of axios (Figure 5b), the compatibility scores are indeed only available for version pairs with available CI test results. It is hard to explain why the scores are missing even for some version pairs with many CI test results (e.g., the update from 0.19.2 to 0.20.0), as we do not know the underlying implementation details.

5.3.3 Triangulation from Survey

Developers have diverging opinions on whether compatibility scores are available (7% Strongly Agree, 24.8% Agree, 38.8% Neutral, 17.8% Disagree, 11.6% Totally Disagree) and whether compatibility scores are effective if they are available (4.7% Strongly Agree, 21.7% Agree, 45.7% Neutral, 19.4% Disagree, and 8.5% Totally Disagree). The answer distributions and the high number of Neutral responses likely indicate that many developers do not know how to rate the two statements [76], because compatibility scores are too scarce and most developers have not been exposed to this feature. As replied by one developer: Compatibility scores and vulnerable dependencies detection are great, I use Dependabot a lot but was not aware they exist...(They) should be more visible to the user. Another developer does express concerns that compatibility scores are not effective, saying that Dependabots compatibility score has never worked for me.

Further, several developers (6 responses in our survey) hold the belief that Dependabot only works well in projects with a high-quality test suite. For example:

  1. Dependabot works best with a high test coverage and if it fails people its likely because they have too little test coverage.
  2. Dependabot without a good test suite is indeed likely too noisy, but with good tests and an understanding of the code base it is trivial to know whether an update is safe to update or not.

Findings for RQ3:

Compatibility scores are too scarce to be effective: only 3.4% of PRs have a known compatibility score. For those PRs with one, the scores have a weak correlation (\rho = 0.37) with the PR merge rate. Its scarcity may be because most dependency version pairs do not have sufficient CI test results (i.e., a Pareto-like distribution) for inferring update compatibility. As a result, developers think Dependabot only works well in projects with high-quality test suites. 5.4 RQ4: Configuration

5.4.1 Repository Analysis Methods

Dependabot offers tons of configuration options for integration with project workflows, such as who to review, how to write commit messages, how to label, etc. In this research question, we only focus on the options related to notifications because we expect them to be possible countermeasures against noise and notification fatigue. More specifically, we investigate the following options provided by Dependabot:

  1. schedule.interval: This option is mandatory and specifies how often Dependabot scans project dependencies, checks for new versions, and opens update PRs. Possible values include "daily", "weekly", and "monthly".

  2. open-pull-requests-limit: It specifies the maximum number of simultaneously open Dependabot PRs allowed in a project. The default value is five.

  3. allow: It tells Dependabot to only update a subset of dependencies. By default, all dependencies are updated.

  4. ignore: It tells Dependabot to ignore a subset of dependencies. By default, no dependency is ignored.

The latter two options are very flexible and may contain constraints exclusive to some package ecosystems, e.g., allowing updates in production manifests or ignoring patch updates according to the semantic versioning convention [31].

To understand developers current practice of configuring Dependabot, we parse 3,921 Dependabot configurations from 1,588 projects with a dependabot.yml in their current working tree. For schedule.interval and open-pull-requests-limit, we count the frequency of each value. For allow and ignore, we parse different options and group them into three distinctive strategies:

  1. default: allowing Dependabot to update all dependencies, which is its default behavior;

  2. ignorelist: configuring Dependabot to ignore a subset of dependencies;

  3. allowlist: configuring Dependabot to only update on a subset of dependencies.

We further explore the modification history of Dependabot configurations to observe how developers use configuration as a countermeasure against noise in the wild. For this purpose, we find all commits in the 1,823 projects that modified dependabot.yml and extract eight types of configuration changes from file diffs:

  1. +interval: Developers increase schedule.interval.

  2. -interval: Developers decrease schedule.interval.

  3. +limit: Developers increase open-pull-requests-limit.

  4. -limit: Developers decrease open-pull-requests-limit.

  5. +allow: Developers allow some more dependencies to be automatically updated by Dependabot.

  6. -allow: Developers no longer allow some dependencies to be automatically updated by Dependabot.

  7. +ignore: Developers configure Dependabot to ignore some dependencies for automated update.

  8. -ignore: Developers configure Dependabot to no longer ignore some dependencies for automated update.

  1. Note that 235 of the 1,823 projects do not have dependabot.yml in their current working tree which we will investigate in RQ5. One project may depend on more than one package ecosystem (e.g., both npm and PyPI) and have separate configurations for each of them.

Finally, we analyze configuration modifications by time since Dependabot adoption. We mainly focus on the bursts of modification patterns, because bursts illustrate the lag from the developers perception of noise to their countermeasures to mitigate the noise.

5.4.2 Repository Analysis Results

The current configurations of Dependabot show that most projects configure Dependabot toward a proactive update strategy: 2,203 (56.2%) of schedule.interval are "daily" while merely 276 (7.04%) of them are a conservative "monthly". 1,404 (35.8%) of the open-pull-requests-limit configurations are higher above the default value while only a negligible proportion (2.3%) is lower. For allow and ignore options, most of the configurations (3,396, 86.7%) adopt the default strategy, less (380, 9.7%) use ignorelist, and a small proportion (50, 1.3%) use allowlist.

The modifications tell us another story. 776 (42.57%) of the 1,823 projects in our dataset have modified the Dependabot configuration options we study (e.g., update interval) and they contain 2.18 modification commits on average (median = 1.00). Figure 7 illustrates the proportion of each modification type, which shows that projects increase schedule.interval and lower open-pull-requests-limit more frequently than doing the opposite. As demonstrated in Figure 8, projects can increase schedule.interval any time after Dependabot adoption but more likely to reduce open-pull-requests-limit only after several months of Dependabot usage. schedule.interval determines how often Dependabot bothers developers to a large extent, and we are seeing developers of 336 projects increasing it in 868 configurations. We further confirm this behavior as a countermeasure against noise from a real-life example where developers reduce the frequency to monthly to reduce noise [77]. open-pull-requests-limit quantifies the devel- operators workload on each interaction, which is also noise-related as indicated by a developers complaint: Dependabot PRs quickly get out of hand [78]. If we focus on modifications that happen 90 days after Dependabot adoption, we find nearly two-thirds (62.5%) of open-pull-requests-limit changes belong to -limit. Our observations indicate the following phenomenon. At the beginning of adoption, developers configure Dependabot to interact frequently and update proactively. However, they later get overwhelmed and suffer from notification fatigue, which causes them to reduce interaction with Dependabot or even deprecate Dependabot (RQ5). As an extreme case, one developer forces Dependabot to open only 1 PR at a time to reduce noise [79].

Ignoring certain dependencies seems to be another noise countermeasure, for developers tend to add an ignored dependency more often than remove one (Figure 7). For example, a commit says update ignored packages...so they are never automatically updated to stop noise [80]. However, we also observe cases where developers add ignored dependencies due to other intentions, such as handling breaking changes [81] and preserving backward compatibility [82]. For +allow and -allow, we observe an interesting burst of -allow (Figure 8c) earlier but more +allow dependencies later, but we do not find any evidence explaining such trend.

5.4.3 Triangulation from Survey

Although more than half of respondents think Dependabot can be configured to fit their needs (25.6% Strongly Agree and 30.2% Agree), some do not (7.8% Totally Disagree and 14% Disagree). As a peek into this controversy, one developer says, I think people that complain about how noisy it is (Ive seen a lot of this) just dont configure things correctly.

More than half (50.4%) of respondents have configured Dependabot to make it less noisy, but roughly one-third (32.6%) have not (21.2% Strongly Agree, 29.5% Agree, 16.7% Neutral, 20.5% Disagree, 12.1% Totally Disagree). It is possible that the default configurations of Dependabot only work for projects with a limited number of dependencies and these dependencies are not fast-evolving (see Section 5.2.3); for other projects, developers need to tweak the configurations multiple times to find a sweet spot for their projects. However, many respondents eventually find that Dependabot does not offer the options they want for noise reduction, such as update grouping and auto-merge. We will investigate this in-depth in RQ5.

Findings for RQ4: The majority of Dependabot configurations imply a proactive update strategy, but we observe multiple patterns of noise avoidance from configuration modifications, such as increasing schedule intervals, lowering the maximum number of open PRs, and ignoring certain dependencies.

5.5 RQ5: Deprecations & Desired Features

5.5.1 Repository Analysis Methods

To locate projects that may have deprecated Dependabot, we find projects with no dependabot.yml in their current working trees, resulting in 235 projects. For each of them, we identify the last commit that removes dependabot.yml, inspect their commit messages, and identify any referenced issues/PRs following the GitHub convention. If the dependabot.yml removal turns out to be due to a project restructure or stop of maintenance, we consider it as a false positive and exclude it from further analysis.

For the remaining 206 projects, we analyze reasons for deprecation from commit messages and issue/PR text (i.e., titles, bodies, and comments). Since a large proportion of text in commit messages, issues, and PRs are irrelevant to Dependabot deprecation reasons, two authors read and re-read all text in the corpus, retaining only the relevant. They encode reasons from text and discuss them until reaching a consensus. They do not conduct independent coding and measure inter-rater agreement because the corpus is very small (only 27 deprecations contain documented reasons).

For each of the confirmed deprecations, we check bot configuration files and commit/PR history to find possible migrations. We consider a project as having migrated to another dependency management bot (or other automation approaches) if it meets any of the following criteria:

  1. developers have specified the migration target in the commit message or issue/PR text;
  2. dependabot.yml is deleted by another dependency management bot (e.g., Renovate Bot automatically deletes dependabot.yml in its setup PR);
  3. the project adopts another dependency management bot within 30 days before or after Dependabot deprecation.

To obtain the developers desired features for a dependency management bot, we ask two optional open-ended questions at the end of the survey (Table 2). The two questions are answered by 97 and 46 developers, respectively. To identify recurring patterns from the answers, two authors of this paper (both with >6 years of software development experience and familiar with using Dependabot) conduct open coding [83] on the responses to generate an initial set of codes. They read and re-read all answers to familiarize themselves with and gain an initial understanding of them. Then, one author assigns text in answers to some initial codes that reflects common features in dependency management bots and discusses with the other author to iteratively refine the codes until a consensus is reached. They further conduct independent coding on the answers using the refined codes and exclude answers that do not reflect anything related to this RQ. As each response may contain multiple codes, we use MASI distance [84] to measure the distance between two raters codes and Krippendorffs alpha [85] to measure inter-rater reliability. The Krippendorffs alpha we obtain is 0.865, which satisfies the recommended threshold of 0.8 and indicates a high reliability [85].

5.5.2 Repository Analysis Results

We confirm 206 of the 235 candidates to be real-life Dependabot deprecations, which is substantial considering that our dataset only contains 1,823 projects. From Figure 9, we can observe that Dependabot deprecations are evenly distributed over time in general with a few fluctuations, mostly coming from organization-wide deprecations. For instance, the maximum value in December 2020 is caused by 26 Dependabot deprecations in octokit, the official GitHub API client implementation. We encode nine categories of reasons from the 27 deprecations that explicitly mentioned their reasons:

  1. Notification Fatigue (9 Deprecations): Developers do recognize Dependabots overwhelming notifications and PRs as the central issue in their experience with Dependabot. As noted by one developer: “Ive been going mad with dependabot alerts which are annoying and pointless. Id rather do manual upgrades than use this” [86].

  2. Lack of Grouped Update Support (7 Deprecations): By the Dependabot convention, each PR updates one dependency and one dependency only, which comes unhandy in two scenarios: a) related packages tend to follow similar release schedules, which triggers Dependabot to raise a PR storm on their updates [87]; b) in some cases, dependencies must be updated together to avoid breakages [88]. The excessive notifications and additional manual work quickly frustrate developers. For example: a) My hope was that we can better group dependency upgrades. With the default configuration, there is some grouping happening, but most dependencies would be upgraded individually [89]; b) Also, a lot of packages have to be updated together. Separate PRs for everything isnt very fun [90].

  3. Package Manager Incompatibility (7 Deprecations): Developers may have compatibility issues after the introduction of a new package manager or a newer version of the package manager. In the seven cases we have found, five concern yarn v2, one concerns npm v7 (specifically lockfile v3), and one concerns pnpm. To make matters worse, Dependabot may even have undesirable behaviors, e.g., messing around with yarn lockfiles [91], when encountered with such incompatibilities. This contributes to developers update suspicion, as merging pull requests leads to possible breakages in dependency specification files. At the time of writing, Dependabot still has no clear timeline on supporting pnpm [92] or yarn v2 [93]. For the unlucky part of Dependabot users, it means to revert [94], to patch Dependabot PRs manually or automatically [95], or to migrate to an alternative, e.g., Renovate Bot [96].

  4. Lack of Configurability (5 Deprecations): Dependabot is also deprecated due to developers struggle to tailor a suitable configuration. For example: a) it appears that were not able to configure Dependabot to only give us major/minor upgrades [97]; b) Dependabot would require too much configuration long-term too easy to forget to add a new package directory [98]. Developers mention that other dependency management bots can provide more fine-grained configuration options such as update scope and schedule: (Renovate Bot) has a load more options we could tweak too compared to Dependabot if we want to reduce the frequency further [99].

  5. Absence of Auto-Merge (3 Deprecations): Alfadel et al. [54] illustrate that auto-merge features are tightly associated with rapid PR merges. However, GitHub refused to offer this feature in Dependabot [100], claiming that auto-merge allows malicious dependencies to propagate beyond the supervision of project maintainers. This may render Dependabot impractical, as claimed by a developer: (the absence of auto-merge) creates clutter and possibly high maintenance load.

We notice a non-negligible proportion (8.17%) of pull requests are merged by third-party auto-merge implementations (e.g., a CI workflow or a GitHub App). Unfortunately, they may become dysfunctional on public repositories after GitHub enforced a change on Dependabot PR triggered workflows [101]. This turns out to be the last straw for several Dependabot deprecations. As a developer states, they dropped Dependabot because latest changes enforced by GitHub prevent using the action in Dependabots PRs context.

  1. High CI Usage (3 Deprecations): Maintainers from 3 projects complain that Dependabots substantial, auto-rebasings PRs have devoured their CI credits. In their words, Dependabots CI usage is what killed us with Dependabot, and a waste of money and carbon.

Other reasons for Dependabot deprecation include: 7) Dependabot Bugs (2 Deprecations), 8) Unsatisfying of Branch Support (1 Deprecation), and 9) Inability to Modify Custom Files (1 Deprecation).

The deprecation of Dependabot does not necessarily mean developers loss of faith in automating dependency updates. Actually, over two-thirds (68.4%, 141/206) of the projects turn to another bot or set up custom CI workflows to support their dependency updates. Among them, Renovate Bot (122) is the most popular migration target, followed by projen (15), npm-check-updates (2) and depfu (1).

5.5.3 Triangulation from Survey

Among the 131 surveyed developers, 14 (10.7%) tell us they have deprecated Dependabot in their projects. Most of the reasons they provide fall within our analysis and the frequency distribution is highly similar. There are two exceptions: one deprecates because Dependabot frequently breaks code and one deprecates because their entire project has been stalled. Developers also respond in our survey that they think automated dependency management is important and beneficial for their projects but the limitations of Dependabot causes them to do the deprecation. For example: Dependabot could be great, it just needs a few fixes here and there. Its unclear why Dependabot hasnt been polished. They also reply to us that Renovate Bot does provide some features that they need (e.g., grouped update PRs).

We identify nine major categories of developers desired features (each corresponds to one code) from the answers provided by 84 respondents. The remaining categories are discarded as they are only supported by one answer (which thus may be occasional and not generalizable). We will explain each category in the order of popularity.

  1. Group Update PRs (29 Respondents): This category refers to the feature of automatically grouping some dependency updates into one PR instead of opening one PR for each update. It is most frequently mentioned and developers consider this feature as an important measure for making the handling of bot PRs less tedious, repetitive, and time-consuming. They want the bot to automatically identify dependencies that should be updated together and merge them into one PR update because many libraries (e.g., symfony, @typescript-eslint, babel) version all packages under a single version. They also want the bot to automatically find and merge “safe” updates into one PR while leaving “unsafe” updates as single PRs for more careful reviewing.

  2. Package Manager Support (20 Respondents): This category refers to the feature of supporting more package managers (and their corresponding ecosystems) or features for the bot to align with the conventions in the package manager/ecosystem. Developers have expressed their desire for the bot to support Gradle, Flatter, Poetry, Anaconda, C++, yarn v2, Clojure, Cargo, CocoaPods, Swift Package Manager (in iOS), etc., indicating that dependency management bots, if well designed and implemented, can indeed benefit a wide range of developers and software development domains. Dependabot does claim support for many package managers mentioned before but it still needs to be tailored and improved in, e.g., performance and update behaviors: a) When I have 3 open Poetry updates I can merge one and then have to wait 15 minutes for the conflicts to be resolved. b) Perhaps for node.js projects the ability to update package.json in addition to package.lock, so the dependency update is made explicit.

  3. Auto-Merge (19 Respondents): This category refers to the feature of automatically merging some update PRs into the repository if certain conditions are satisfied. As mentioned in Section 5.3.3, some developers believe as long as their projects have high-quality test suites, it will be trivial to review the update PR and they would prefer them to be merged automatically if the tests pass.

Despite the significant demand, this feature also seems to be especially controversial because doing this means offloading trust and giving bot autonomy. Although GitHub considers it unacceptable due to security risks [100], our survey clearly indicates that many still want to do this even if they are well aware of the risks. They also think the responsibility of risk control, e.g., vetting new releases, should be given to some capable central authority, not them. Here are three response examples: a) While this might be somewhat dangerous, and should be configurable somehow, [auto-merge] is something that basically already happens when I merge such PRs. b) If I am merging with Dependabot like 60 deps a day - I dont know if some of the versions are not published by hackers who took over the repository account, so it would be great if there was some authority where humans actually check the changes and mark them secure. c) For me itd be good if I could mute all notifications about Dependabot PRs except for when tests failed, indicating that I need to manually resolve some issues. Otherwise Id be happy not to hear about it updating my deps.

  1. Display Release Notes (8 Respondents): This category refers to the feature of always showing some sort of release notes or change logs in update PRs to inform developers of the changes in an update. Although Dependabot sometimes can provide release notes in PRs (Figure 1), it fails for 24.8% of the PRs in our dataset. One possible reason for this is that release notes are often missing or inaccessible in open source projects [35], which is also confirmed by one of our survey respondents: Most npm package updates feel unnecessary and the maintainers very often dont bother to write meaningful release notes...At the same time, I shouldnt expect maintainers to go through all of their dependencies changelogs either, so perhaps the tool should find those release notes for me.

  2. Avoid Unnecessary Updates (7 Respondents): This category refers to the feature of providing a default behavior and configuration options to avoid updates that most developers in an ecosystem perceived as unnecessary. The most frequently mentioned feature is the ability to define separate update behaviors for development and production (or runtime) dependencies. Many developers would avoid the automatic update of development dependencies because they perceive such updates as mostly noise and there is very little gain in keeping development dependencies up-to-date. Other mentioned features include the ability to detect and avoid updates on bloated dependencies and to only provide updates for dependencies with real security vulnerabilities.

  3. Custom Update Action (5 Respondents): This category of features refers to the ability to define custom update behaviors (using, e.g., regular expressions) to update dependencies in unconventional dependency files.

  4. Configurability (5 Respondents): This category refers to the case of developers expressing that dependency management bots should be highly configurable, but does not provide any further information on the specific configuration options they want, e.g., more configuration options.

  5. git Support (4 Respondents): This category of features concerns the integration of dependency management bots with the version control system (in our case, git). The specific mentioned features include automatic rebase, merge conflict resolution, squashing, etc., all of which help ensure that bot PRs will not incur additional work on developers (e.g., manipulating git branches and resolving conflicts).

  6. Breaking Change Impact Analysis (3 Respondents): This feature category refers to the ability to perform program analysis to identify breaking changes and their impact on client code, e.g., something like a list of parts of my codebase that might be impacted by the update would be useful. This could be based on a combination of changes listed in the release notes and an analysis of where the package is used in my code.

The developers desired features align well with the reasons for Dependabot deprecation, indicating that feature availability can be an important driver for the migrations and competition between dependency management bots.

Findings for RQ5:

11.3% of the studied projects have deprecated Dependabot due to notification fatigue, lack of grouped update support, package manager incompatibility, lack of configurability, absence of auto-merge, etc. 68.4% of them migrate to other ways of automation, among which the most common migration target is Renovate Bot (86.5%). We identify nine categories of developers desired features that align well with the Dependabot deprecation reasons.

6 DISCUSSION

6.1 The State of Dependency Management Bots

In a nutshell, our results indicate that Dependabot could be an effective solution for keeping dependency up-to-date (RQ1, RQ2), but often with significant noise and workloads (RQ1, RQ4, RQ5), many of which could not be mitigated by the features and configuration options offered by Dependabot (RQ5). Apart from that, Dependabots compatibility score solution is hardly a success in indicating the compatibility of a bot update PR (RQ3). As of March 2023, Dependabot is still under active development by GitHub with the majority of effort in supporting more ecosystems (e.g., Docker, GitHub Actions) and adding features to reduce noise (e.g., automatically terminate Dependabot in inactive repositories), according to the GitHub change log [102]. Still, there is plenty of room for improvement to tackle the update suspicion and notification fatigue problem [8].

Among other dependency management bots, Renovate Bot is an actively developed and popular alternative for Dependabot version update (RQ5), while Greenkeeper [9] has been deprecated, PyUp [5] seems to be no longer under active development, and Synk Bot [6] mainly offers security-focused solutions. As of March 2023, Renovate Bot provides more features and configuration options than Dependabot for fine-tuning notifications, including update grouping and auto-merge [103]; it also provides merge confidence badges with more information than Dependabot [104]. However, it is still unclear whether the features and strategies taken by Renovate Bot are actually effective in practice, and we believe Renovate Bot could be an important study subject for future dependency management bot studies.

6.2 What Should be the Key Characteristics of a Dependency Management Bot?

In this section, we try to summarize the key characteristics of an ideal dependency management bot based on the results from our analysis and previous work. We believe they can serve as general design guidelines for practitioners to design, implement, or improve dependency management bots (or other similar automation solutions).

Configurability. Wessel et al. [40] argue that noise is the central challenge in SE bot design and re-configuration should be the main countermeasure against noise. For the case of Dependabot, we find that Dependabot also causes noise to developers by opening more PRs than developers can handle (RQ4), and developers can re-configure multiple times to reduce its noise (RQ4). However, re-configuration is not always successful due to the lack of certain features in Dependabot, causing deprecations and migrations (RQ5). Just as many other software development activities, it is also unlikely for a “silver bullet” to be present, as noted by one of our survey respondents, ...there is no best practice in dependency management which is easy, fast and safe.

Therefore, we argue that configurability, i.e., offering the highest possible configuration flexibility for controlling its update behavior, should be one of the key characteristics of dependency management bots. This helps the bot to minimize unnecessary update notifications and attempts so that developers are less interrupted. Apart from the options already provided by Dependabot, our study indicates that the following configuration options should be present in dependency management bots:

  1. Grouped Updates: Dependency management bots should provide options to group multiple updates into one PR. Possible options include grouping all “safe” updates (e.g., not breaking the CI checks) and updates of closely related dependencies (e.g., different components from the same framework).

  2. Update Strategies: Dependency management bots should allow developers to specify which dependency to update based on more conditions, such as whether the dependency is used in production, the severity of security vulnerabilities, whether the dependency is bloated, etc.

  3. Version Control System Integration: Dependency management bots should allow developers to define how the bot should interact with the version control system, including which branch to monitor, how to manipulate branches and handle merge conflicts, etc.

Autonomy. According to the SE bot definition by Erlenhov et al. [39], the key characteristics of an “Alex” type of SE bot are its ability to autonomously handle (often simple) development tasks and its central design challenges include minimizing interruption and establishing trust with developers. However, without the auto-merge feature, Dependabot is hardly autonomous and this lack of autonomy is disliked by developers (RQ5); in extreme cases, developers use Dependabot entirely as a notification tool but not as a bot (Section 5.1.3). This lack of autonomy is also causing a high level of interruption and workload to developers using Dependabot in their projects (RQ5).

We argue that autonomy, i.e., the ability to perform dependency updates autonomously without human intervention under certain conditions, should be one of the key characteristics of dependency management bots. This characteristic is only possible when the risks and consequences of dependency updates are highly transparent and developers know when to trust these updates. Within the context of GitHub, we believe the current dependency management bots should offer the configuration option to merge update PRs when the CI pipeline passes. This option can be turned on for projects that have a well-configured CI pipeline with thorough static analysis, building, and testing stages, when the developers believe that their pipeline can effectively detect incompatibilities in dependency updates (Section 5.3.3).

With respect to the security concern of auto-merge being used to quickly propagate a malicious package across the ecosystem [100], we argue that the responsibility of verifying new releases in terms of security should not be given to independent developers as they usually do not have the required time and expertise (RQ5). Instead, package hosting platforms (e.g., npm, Maven, PyPI) should vet new package releases and quickly take down malicious releases to minimize their impact. These practices are also advocated in the literature on software supply chain attacks [105].

Transparency. Multiple previous studies, both on SE bots and on other kinds of software bots, point to the importance of transparency in bot design. For example, Erlenhov et al. [39] shows that developers need to establish the trust that the bot can perform correct development tasks. Similarly, Godulla et al. [106] argue that transparency is vital for bots used in corporate communications. In the context of code review bots, Peng and Ma [107] find that contributors expect the bot to be transparent about why a certain code reviewer is recommended. To reduce update suspicion [8] in dependency management bots, developers also need to know when to trust the bot to perform dependency updates.

We argue that transparency, i.e., the ability to transparently demonstrate the risks and consequences of a dependency update, should be one of the key characteristics of dependency management bots. However, the Dependabot compatibility score feature is hardly a success toward this direction and developers only trust their own test suites. Beyond compatibility scores and project test suites, the following research directions may be helpful in enabling transparency in dependency management bots and establishing trust in the bot users:

  1. Program Analysis: One direction to achieve this is to leverage program analysis techniques. There have been significant research and practitioner effort on breaking change analysis [36], two of which have demonstrated the potential of using static analysis in assessing bot PR compatibility [34], [108]. Still, given the extremely large scale of bot PRs [7], more research and engineering effort is needed to implement lightweight and scalable approaches to support each popular ecosystem.

  2. CI Log Analysis: Another direction is to extend the idea of compatibility score with sophisticated techniques that learn more knowledge from CI checks. Since CI checks are scarce for many version pairs (RQ3), it will be interesting to explore techniques that transfer knowledge from other version pairs so that the matrix in Figure 5a can be less sparse. The massive CI checks available from Dependabot PRs would be a promising starting point.

  3. Release Note Generation: Dependabot sometimes fails in locating and providing a release note for the updated dependency, and even if there is one, the maintainers very often dont bother to write meaningful release notes, as noted by one respondent. This situation can be mitigated by applying approaches on software change summarization (e.g., [109]) and release note generation (e.g., [110]).

Self-Adaptability. The ability to adapt to the specific environment and its dynamics is considered as one of the key characteristics of a “rational agent” in artificial intelligence [111], [112]. Dependency management bots can also be considered as autonomous agents working in the artificial environment of social coding platforms (e.g., GitHub). However, our findings reveal that Dependabot often cannot operate in the ways expected by developers (RQ5) and reconfigurations are common (RQ4). Such failures (e.g., update actions, package manager incompatibility, git branching) will lead to interruption and extra work for developers.

We argue that self-adaptability, i.e., the ability to automatically identify and self-adapt to a sensible default configuration in a projects environment, should be one of the key characteristics of dependency management bots. For GitHub projects, its environment can include its major programming languages, package managers & ecosystems, the workflows used, the active timezone, developer preferences and recent activities, etc. A dependency management bot should have the ability to automatically generate a configuration file based on such information, and recommend configuration changes when the environment has changed (e.g., developer responses to bot PRs become slower than usual). This can be implemented by providing a semi-automatic recommender system for recommending an initial configuration to developers and prompting bot PRs for modifying their own configurations after bot adoption.

6.3 Comparison with Previous Work

Several previous studies have also made similar recommendations based on results from Greenkeeper or Dependabot [8], [37], [54], [56]. Studies on Greenkeeper [8], [37] show that dependency management bot causes noise to developers and CI test results are unreliable, but they do not investigate the effectiveness of bot configurations as a countermeasure against noise. Studies on Dependabot [54], [56] either focuses on a different aspect (i.e., security updates [54]) or provides specific recommendations on Dependabot features [56]. Compared with the previous studies, the contributions of our study are: 1) a systematic investigation of the Dependabot version update service, and 2) a comprehensive four-dimension framework for dependency management bot design.

The implications of our study are also related to the larger literature of SE bots and dependency management. With respect to the two fields, the contribution of our study is a unique lens of observation, i.e., Dependabot, that results in a set of tailored recommendations for dependency management bot design. We have carefully discussed in Section 6.2 about how the implications of our study confirm, extend, or echo the implications from existing literature.

6.4 Threats to Validity

6.4.1 Internal Validity

In RQ1, we have provided a holistic analysis of the impact of Dependabot adoption without incorporating possible confounding factors (e.g., the types of dependencies and the characteristics of projects). Consequently, it is difficult for our study to establish a firm answer on the effectiveness of adopting Dependabot and future work is needed to better quantify such impact among possible confounding factors.

Several approximations are used throughout our analysis. In RQ2, we resort to identify security PRs ourselves which may introduce hard-to-confirm errors (only repository owners know whether their PRs are security-related). The merge rate may not accurately reflect the extent to which Dependabot updates are accepted by developers as some projects may use different ways of accepting contributions. To mitigate this threat, we focus on projects that have merged at least 10 Dependabot PRs with the intuition that these projects are unlikely to accept Dependabot PRs in other ways if they have already merged many of them. In RQ3, Dependabots compatibility scores may change over time and it is impossible to know the score at the time of PR creation. In RQ4, Dependabot supports ecosystem specific matchers in dependency specifications, e.g., @angular/*, which we do not consider when parsing configuration files. However, we believe the noise introduced above should be minor and will not invalidate our findings or hinder the reproducibility of our data analysis. Like other studies involving manual coding, our analysis of developer discussions and survey responses are vulnerable to author bias. To mitigate this, two authors double-check all results and validate findings with project commit/PR histories for RQ5; they further conduct inter-rater reliability analysis for RQ5 when the dataset becomes larger. Finally, our own interpretation of the data (RQ1 - RQ5) may also be biased towards our own judgment. To mitigate this, we triangulate our key findings using a developer survey and derive implications based on both our analysis and developers feedback. 6.4.2 External Validity

Just like all case studies, generalizing our specific findings in each RQ to other dependency management bots and even to other projects that use Dependabot should be cautious. Our dataset only contains popular and actively maintained GitHub projects, many of which are already taking proactive updating strategies. Therefore, our findings may not generalize to projects of a smaller scale or more reluctant to update dependencies. The survey responses are collected through convenience sampling which may introduce possible, yet unknown biases in terms of experience, age, gender, development role, etc., so the generalization of our survey results to a broad developer audience should be cautious. The outcome of Dependabot usage may also not generalize to other dependency management bots due to their functionality and user base differences. In RQ1, we only base our analysis on JavaScript/npm projects which may not generalize to other ecosystems with different norms, policies, and practices [11]; the comparison of dependency management bot usage in different ecosystems could be an important avenue for future work. Despite these, we believe the implications we obtain for dependency management bot design should be general. Our proposed framework in Section 6.2 form a roadmap for dependency management bot designers. Our methodology could be applied in future studies to compare the effectiveness of different bots.

7 Conclusion

We present an exploratory study on Dependabot version update service using repository mining and a survey, and we identify important limitations in the design of Dependabot. From our findings, we derive a four-dimension framework in the hope that it can help dependency management bot design and inspire more research work on related fields.

Several directions of future work arise from our study. For example, investigating and comparing other dependency management bots, especially Renovate Bot, can help verify the generalizability of our proposed framework. An empirical foundation on the factors affecting the effectiveness of bot adoption is also necessary. It will be interesting to investigate the recommendation of bot configurations to developers, or to study how different approaches (e.g., program analysis, machine learning, release note generation) can help developers assess the compatibility of bot PRs.

8 Data Availability

We provide a replication package at Figshare:

https://figshare.com/s/78a92332e4843d64b984

The package can be used to replicate the results from repository mining. To preserve the privacy of survey respondents, we choose not to disclose any raw data from the survey.

Acknowledgments

This work is supported by the National Key R&D Program of China Grant 2018YFB1004201 and the National Natural Science Foundation of China Grant 61825201. We sincerely thank the developers who participated in our survey.

References

[1] T. Winters, T. Manshreck, and H. Wright, Software Engineering at Google: Lessons Learned from Programming over Time. OReilly Media, 2020. [2] R. G. Kula, D. M. Germán, A. Ouni, T. Ishio, and K. Inoue, “Do developers update their library dependencies? - an empirical study on the impact of security advisories on library migration,” Empir. Softw. Eng., vol. 23, no. 1, pp. 384417, 2018. [3] https://github.com/dependabot. [4] https://github.com/renovatebot. [5] https://pyup.io/. [6] https://github.com/snyk-bot. [7] M. Wyrich, R. Ghit, T. Haller, and C. Müller, “Bots dont mind waiting, do they? comparing the interaction with automatically and manually created pull requests,” in 3rd IEEE/ACM International Workshop on Bots in Software Engineering, BotSE@ICSE 2021, Madrid, Spain, June 4, 2021. IEEE, 2021, pp. 610. [8] S. Mirhosseini and C. Parnin, “Can automated pull requests encourage software developers to upgrade out-of-date dependencies?” in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, USA, October 30 - November 03, 2017. IEEE Computer Society, 2017, pp. 8494. [9] https://greenkeeper.io/. [10] https://www.sonatype.com/resources/state-of-the-software-supply-chain-2021. [11] C. Bogart, C. Kästner, J. D. Herbsleb, and F. Thung, “When and how to make breaking changes: Policies and practices in 18 open source software ecosystems,” ACM Trans. Softw. Eng. Methodol., vol. 30, no. 4, pp. 42:142:56, 2021. [12] G. Bavota, G. Canfora, M. D. Penta, R. Oliveto, and S. Panichella, “How the apache community upgrades dependencies: an evolutionary study,” Empir. Softw. Eng., vol. 20, no. 5, pp. 12751317, 2015. [13] I. Pashchenko, D. L. Vu, and F. Massacci, “A qualitative study of dependency management and its security implications,” in CCS 20: 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, November 9-13, 2020. ACM, 2020, pp. 15131531. [14] J. Cox, E. Bouwers, M. C. J. D. van Eekelen, and J. Visser, “Measuring dependency freshness in software systems,” in 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 2. IEEE Computer Society, 2015, pp. 109118. [15] J. M. González-Barahona, P. Sherwood, G. Robles, and D. Izquierdo-Cortazar, “Technical lag in software compilations: Measuring how outdated a software deployment is,” in Open Source Systems: Towards Robust Practices - 13th IFIP WG 2.13 International Conference, OSS 2017, Buenos Aires, Argentina, May 22-23, 2017, Proceedings, ser. IFIP Advances in Information and Communication Technology, vol. 496, 2017, pp. 182192. [16] A. Zerouali, T. Mens, and E. Constantinou, “On the evolution of technical lag in the npm package dependency network,” in 2018 IEEE International Conference on Software Maintenance and Evolution, ICSME 2018, Madrid, Spain, September 23-29, 2018. IEEE Computer Society, 2018, pp. 404414. [17] A. Zerouali, E. Constantinou, T. Mens, G. Robles, and J. M. González-Barahona, “An empirical analysis of technical lag in npm package dependencies,” in New Opportunities for Software Reuse - 17th International Conference, ICSR 2018, Madrid, Spain, May 21-23, 2018, Proceedings, ser. Lecture Notes in Computer Science, vol. 10826. Springer, 2018, pp. 95110. [18] A. Zerouali, T. Mens, J. M. González-Barahona, A. Decan, E. Constantinou, and G. Robles, “A formal framework for measuring technical lag in component repositories - and its application to npm,” J. Softw. Evol. Process., vol. 31, no. 8, 2019. [19] J. Stringer, A. Tahir, K. Blincoe, and J. Dietrich, “Technical lag of dependencies in major package managers,” in 27th Asia-Pacific Software Engineering Conference, APSEC 2020, Singapore, December 1-4, 2020. IEEE, 2020, pp. 228237. [20] A. Zerouali, T. Mens, A. Decan, J. M. González-Barahona, and G. Robles, “A multi-dimensional analysis of technical lag in Debian-based docker images,” Empir. Softw. Eng., vol. 26, no. 2, p. 19, 2021. [21] K. Chow and D. Notkin, “Semi-automatic update of applications in response to library changes,” in 1996 International Conference on Software Maintenance (ICSM 96), 4-8 November 1996, Monterey, CA, USA, Proceedings. IEEE Computer Society, 1996, p. 359.

[22] J. Henkel and A. Diwan, “CatchUp: capturing and replaying refactorings to support API evolution,” in 27th International Conference on Software Engineering (ICSE 2005), 15-21 May 2005, St. Louis, Missouri, USA. ACM, 2005, pp. 274283.

[23] Z. Xing and E. Stroulia, “API-evolution support with Diff-CatchUp,” IEEE Trans. Software Eng., vol. 33, no. 12, pp. 818836, 2007.

[24] H. A. Nguyen, T. T. Nguyen, G. W. Jr., A. T. Nguyen, M. Kim, and T. N. Nguyen, “A graph-based approach to API usage adaptation,” in Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2010, October 17-21, 2010, Reno/Tahoe, Nevada, USA. ACM, 2010, pp. 302321.

[25] B. Dagenais and M. P. Robillard, “Recommending adaptive changes for framework evolution,” ACM Trans. Softw. Eng. Methodol., vol. 20, no. 4, pp. 19:119:35, 2011.

[26] B. Cossette and R. J. Walker, “Seeking the ground truth: a retrospective study on the evolution and migration of software libraries,” in 20th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-20), SIGSOFT/FSE12, Cary, NC, USA - November 11 - 16, 2012. ACM, 2012, p. 55.

[27] K. Huang, B. Chen, L. Pan, S. Wu, and X. Peng, “REPFINDER: finding replacements for missing APIs in library update,” in 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021. IEEE, 2021, pp. 266278.

[28] B. B. Nielsen, M. T. Torp, and A. Møller, “Semantic patches for adaptation of JavaScript programs to evolving libraries,” in 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, 2021, pp. 7485.

[29] S. A. Haryono, F. Thung, D. Lo, J. Lawall, and L. Jiang, “ML-CatchUp: Automated update of deprecated machine-learning APIs in Python,” in IEEE International Conference on Software Maintenance and Evolution, ICSEM 2021, Luxembourg, September 27 - October 1, 2021. IEEE, 2021, pp. 584588.

[30] S. A. Haryono, F. Thung, D. Lo, L. Jiang, J. Lawall, H. J. Kang, L. Semino, and C. Müller, “AndroEvolve: Automated Android API update with data flow analysis and variable denormalization,” Empir. Softw. Eng., vol. 27, no. 3, p. 73, 2022.

[31] https://semver.org/.

[32] S. Mostafa, R. Rodriguez, and X. Wang, “Experience paper: a study on behavioral backward incompatibilities of Java software libraries,” in Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, Santa Barbara, CA, USA, July 10 - 14, 2017. ACM, 2017, pp. 215225.

[33] S. Raemaekers, A. van Deursen, and J. Visser, “Semantic versioning and impact of breaking changes in the Maven repository,” J. Syst. Softw., vol. 129, pp. 140158, 2017.

[34] J. Hejderup and G. Gousios, “Can we trust tests to automate dependency updates? A case study of Java projects,” J. Syst. Softw., vol. 183, p. 111097, 2022.

[35] J. Wu, H. He, W. Xiao, K. Gao, and M. Zhou, “Demystifying software release note issues on GitHub,” in Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, ICPC 2022, Pittsburgh, USA, May 16-17, 2022. ACM, 2022.

[36] P. Lam, J. Dietrich, and D. J. Pearce, “Putting the semantics into semantic versioning,” in Proceedings of the 2020 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Onward! 2020, Virtual, November, 2020. ACM, 2020, pp. 157179.

[37] B. Rombaut, F. R. Cogo, B. Adams, and A. E. Hassan, “Theres no such thing as a free lunch: Lessons learned from exploring the overhead introduced by the Greenkeeper dependency bot in npm,” ACM Transactions on Software Engineering and Methodology, 2022.

[38] M. S. Wessel, B. M. de Souza, I. Steinmacher, I. S. Wiese, I. Polato, A. P. Chaves, and M. A. Gerosa, “The power of bots: Characterizing and understanding bots in OSS projects,” Proc. ACM Hum. Comput. Interact., vol. 2, no. CSCW, pp. 182:1182:19, 2018.

[39] L. Erlenhov, F. G. de Oliveira Neto, and P. Leitner, “An empirical study of bots in software development: characteristics and challenges from a practitioners perspective,” in ESEC/FSE 20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020. ACM, 2020, pp. 445455.

[40] M. S. Wessel, I. Wiese, I. Steinmacher, and M. A. Gerosa, “Dont disturb me: Challenges of interacting with software bots on open source software projects,” Proc. ACM Hum. Comput. Interact., vol. 5, no. CSCW2, pp. 121, 2021.

[41] M. S. Wessel, A. Abdellatif, I. Wiese, T. Conte, E. Shihab, M. A. Gerosa, and I. Steinmacher, “Bots for pull requests: The good, the bad, and the promising,” in 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. IEEE, 2022, pp. 274286.

[42] E. Shihab, S. Wagner, M. A. Gerosa, M. Wessel, and J. Cabot, “The present and future of bots in software engineering,” IEEE Software, 2022.

[43] S. Santhanam, T. Hecking, A. Schreiber, and S. Wagner, “Bots in software engineering: a systematic mapping study,” PeerJ Comput. Sci., vol. 8, p. e866, 2022.

[44] https://github.com/apps/dependabot-preview.

[45] L. Erlenhov, F. G. de Oliveira Neto, and P. Leitner, “Dependency management bots in open-source systems - prevalence and adoption,” PeerJ Comput. Sci., vol. 8, p. e849, 2022.

[46] T. Dey, S. Mousavi, E. Ponce, T. Fry, B. Vasilescu, A. Filippova, and A. Mockus, “Detecting and characterizing bots that commit code,” in MSR 20: 17th International Conference on Mining Software Repositories, Seoul, Republic of Korea, 29-30 June, 2020. ACM, 2020, pp. 209219.

[47] https://www.indiehackers.com/interview/living-off-our-savings-and-growing-our-saas-to-740-mo.

[48] https://www.indiehackers.com/product/dependabot-acquired-by-github-1g7T7DN1rGEZM204shF.

[49] https://github.com/baker/dependabot-preview/.

[50] https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated/configuration-options-for-dependency-updates.

[51] https://docs.github.com/en/code-security/supply-chain-security/managing-vulnerabilities-in-your-projects-dependencies/about-alerts-for-vulnerable-dependencies#access-to-dependabot-alerts.

[52] Pull Request #1127 of datadesk/baker.

[53] https://docs.github.com/en/code-security/supply-chain-security/managing-vulnerabilities-in-your-projects-dependencies/about-dependabot-security-updates.

[54] M. Alfadel, D. E. Costa, E. Shihab, and M. Mkhallalati, “On the use of Dependabot security pull requests,” in 18th IEEE/ACM International Conference on Mining Software Repositories, MSR 2021, Madrid, Spain, May 17-19, 2021. IEEE, 2021, pp. 254265.

[55] C. Soto-Valero, T. Durieux, and B. Baudry, “A longitudinal analysis of bloated Java dependencies,” in ESEC/FSE 21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021. ACM, 2021, pp. 10211031.

[56] F. R. Cogo and A. E. Hassan, “Understanding the customization of dependency bots: The case of dependabot,” IEEE Software, 2022.

[57] Pull Request #4317 of caddyserver/caddy.

[58] G. Gousios, “The GHTorrent dataset and tool suite,” in Proceedings of the 10th Working Conference on Mining Software Repositories, ser. MSR 13. Piscataway, NJ, USA: IEEE Press, 2013, pp. 233236.

[59] N. Munaiah, S. Kroh, C. Cabrey, and M. Naqappan, “Curating GitHub for engineered software projects,” Empir. Softw. Eng., vol. 22, no. 6, pp. 32193253, 2017.

[60] H. He, R. He, H. Gu, and M. Zhou, “A large-scale empirical study on Java library migrations: prevalence, trends, and rationales,” in ESEC/FSE 21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021. ACM, 2021, pp. 478490.

[61] https://docs.github.com/en/rest.

[62] https://workers.cloudflare.com/.

[63] https://github.com/advisories.

[64] M. Goeminnie and T. Mens, “Evidence for the Pareto principle in open source software activity,” in the Joint Proceedings of the 1st International workshop on Model Driven Software Maintenance and 5th International Workshop on Software Quality and Maintainability. Citeseer, 2011, pp. 7482.

[65] Y. Zhang, M. Zhou, A. Mockus, and Z. Jin, “Companies participation in OSS development—an empirical study of OpenStack,” IEEE Trans. Software Eng., vol. 47, no. 10, pp. 22422259, 2021. [66] A. Decan, T. Mens, and P. Grosjean, “An empirical comparison of dependency network evolution in seven software packaging ecosystems,” Empir. Softw. Eng., vol. 24, no. 1, pp. 381416, 2019.

[67] https://github.com/dependabot/dependabot-core/issues/4146.

[68] R. Likert, “A technique for the measurement of attitudes.” Archives of Psychology, 1932.

[69] https://tools4dev.org/resources/how-to-choose-a-sample-size/.

[70] X. Tan, M. Zhou, and Z. Sun, “A first look at good first issues on GitHub,” in ESEC/FSE 20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020. ACM, 2020, pp. 398409.

[71] https://www.qualtrics.com/blog/ethical-issues-for-online-surveys/.

[72] Y. Zhao, A. Serebrenik, Y. Zhou, V. Filkov, and B. Vasilescu, “The impact of continuous integration on other software development practices: a large-scale empirical study,” in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, USA, October 30 - November 03, 2017. IEEE Computer Society, 2017, pp. 6071.

[73] N. Cassee, B. Vasilescu, and A. Serebrenik, “The silent helper: The impact of continuous integration on code reviews,” in 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2020, London, ON, Canada, February 18-21, 2020. IEEE, 2020, pp. 423434.

[74] J. Romano, J. D. Kromrey, J. Coraggio, and J. Skowronek, “Appropriate statistics for ordinal level data: Should we really be using t-test and Cohens d for evaluating group differences on the nse and other surveys,” in Annual Meeting of the Florida Association of Institutional Research, vol. 177, 2006, p. 34.

[75] S. Prion and K. Haerling, “Making sense of methods and measurement: Spearman-rho ranked-order correlation coefficient,” Clinical Simulation in Nursing, vol. 10, p. 535536, 10 2014.

[76] P. Sturgis, C. Roberts, and P. Smith, “Middle alternatives revisited: How the neither/nor response acts as a way of saying “i dont know”?” Sociological Methods & Research, vol. 43, no. 1, pp. 1538, 2014.

[77] Pull Request #259 of dropbox/stone.

[78] Pull Request #3155 of tuist/tuist.

[79] Pull Request #663 of ros-tooling/action-ros-ci.

[80] Commit #b337b5f of justeat/httpclient-interception.

[81] Pull Request #1260 of asynkron/protoactor-dotnet.

[82] Commit #a06b04e of Azure/bicep.

[83] S. H. Khandkar, “Open coding,” University of Calgary, vol. 23, p. 2009, 2009.

[84] R. J. Passonneau, “Measuring agreement on set-valued items (MASI) for semantic and pragmatic annotation,” in Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy, May 22-28, 2006. European Language Resources Association (ELRA), 2006, pp. 831836.

[85] K. Krippendorff, Content Analysis: An Introduction to its Methodology. Sage publications, 2018.

[86] Pull Request #134 of skytable/skytable.

[87] Comment from Issue #1190 of dependabot/dependabot-core.

[88] Issue #1296 of dependabot/dependabot-core.

[89] Pull Request #2635 of giantswarm/happa.

[90] Commit #8cecf22 of Fate-Grand-Automata/FGA.

[91] Pull Request #1976 of stoplightio/spectral.

[92] Issue #1736 of dependabot/dependabot-core.

[93] Issue #1297 of dependabot/dependabot-core.

[94] Issue #202 of nitzano/gatsby-source-hashnode.

[95] Issue #26 of replygirl/tc.

[96] Pull Request #1987 of stoplightio/spectral.

[97] Pull Request #2916 of codalab/codalab-worksheets.

[98] Pull Request #126 of lyft/clutch.

[99] Pull Request #3622 of video-dev/hls.js.

[100] Comment from Issue #1973 of dependabot/dependabot-core.

[101] Issue #60 of ahmadnassri/action-dependabot-auto-merge.

[102] https://github.blog/changelog/label/dependabot/.

[103] https://docs.renovatebot.com/.

[104] https://docs.renovatebot.com/merge-confidence/.

[105] M. Zimmermann, C. Staicu, C. Tenny, and M. Pradel, “Small world with high risks: A study of security threats in the npm ecosystem,” in 28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, August 14-16, 2019. USENIX Association, 2019, pp. 9951010.

[106] A. Godulla, M. Bauer, J. Dietlmeier, A. Lück, M. Matzen, and F. Vaaßen, “Good bot vs. bad bot: Opportunities and consequences of using automated software in corporate communications,” 2021.

[107] Z. Peng and X. Ma, “Exploring how software developers work with mention bot in github,” CCF Trans. Pervasive Comput. Interact., vol. 1, no. 3, pp. 190203, 2019.

[108] D. Foo, H. Chua, J. Yeo, M. Y. Ang, and A. Sharma, “Efficient static checking of library updates,” in Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018. ACM, 2018, pp. 791796.

[109] L. F. Cortes-Coy, M. L. Vásquez, J. Aponte, and D. Poshyvanyk, “On automatically generating commit messages via summarization of source code changes,” in 14th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2014, Victoria, BC, Canada, September 28-29, 2014. IEEE Computer Society, 2014, pp. 275284.

[110] L. Moreno, G. Bavota, M. D. Penta, R. Oliveto, A. Marcus, and G. Canfora, “ARENA: an approach for the automated generation of release notes,” IEEE Trans. Software Eng., vol. 43, no. 2, pp. 106127, 2017.

[111] D. Poole, A. Mackworth, and R. Goebel, Computational Intelligence: A Modern Approach. Pearson Education, Inc., 2010.

Runzhi He is currently an undergraduate student at the School of Electronics Engineering and Computer Science (EECS), Peking University. His research mainly focuses on open source sustainability and software supply chain. He can be contacted via rzhe@pku.edu.cn

Hao He is currently a Ph.D. student at the School of Computer Science, Peking University. Before that, he received his B.S. degree in Computer Science from Peking University in 2020. His research addresses socio-technical sustainability problems in open source software communities, ecosystems, and supply chains. More information can be found on his personal website https://hehao98.github.io/ and he can be reached at hehao98@pku.edu.cn.

Yuxia Zhang is currently an assistant professor at the School of Computer Science and Technology, Beijing Institute of Technology (BIT). She received her Ph.D. in 2020 from the School of Electronics Engineering and Computer Science (EECS), Peking University. Her research interests include mining software repositories and open-source software ecosystems, mainly focusing on commercial participation in open-source. She can be contacted at yuxiazh@bit.edu.cn. Minghui Zhou received the BS, MS, and Ph.D. degrees in computer science from the National University of Defense Technology in 1995, 1999, and 2002, respectively. She is a professor in the School of Computer Science at Peking University. She is interested in software digital sociology, i.e., understanding the relationships among people, project culture, and software products through mining the repositories of software projects. She is a member of the ACM and IEEE. She can be reached at zhmh@pku.edu.cn.